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Preface 

This  edition  of  Elementary  Linear  Algebra  gives  an  introductory  treatment  of  linear  algebra  that  is  suitable  for 
a first  undergraduate  course.  Its  aim  is  to  present  the  fundamentals  of  linear  algebra  in  the  clearest  possible 
way — sound  pedagogy  is  the  main  consideration.  Although  calculus  is  not  a prerequisite,  there  is  some 
optional  material  that  is  clearly  marked  for  students  with  a calculus  background.  If  desired,  that  material  can 
be  omitted  without  loss  of  continuity. 

Technology  is  not  required  to  use  this  text,  but  for  instructors  who  would  like  to  use  MAT  LAB,  Mathematica , 
Maple,  or  calculators  with  linear  algebra  capabilities,  we  have  posted  some  supporting  material  that  can  be 
accessed  at  either  of  the  following  Web  sites: 

www.howardanton.com 

www.wilev.com/college/anton 


Summary  of  Changes  in  this  Edition 

This  edition  is  a major  revision  of  its  predecessor.  In  addition  to  including  some  new  material,  some  of  the  old 
material  has  been  streamlined  to  ensure  that  the  major  topics  can  all  be  covered  in  a standard  course.  These 
are  the  most  significant  changes: 

Vectors  in  2-space,  3-space,  and  n-space  Chapters  3 and  4 of  the  previous  edition  have  been  combined 
into  a single  chapter.  This  has  enabled  us  to  eliminate  some  duplicate  exposition  and  to  juxtapose  concepts 
in  «-space  with  those  in  2-space  and  3-space,  thereby  conveying  more  clearly  how  n-space  ideas  generalize 
those  already  familiar  to  the  student. 

New  Pedagogical  Elements  Each  section  now  ends  with  a Concept  Review  and  a Skills  mastery  that 
provide  the  student  a convenient  reference  to  the  main  ideas  in  that  section. 

New  Exercises  Many  new  exercises  have  been  added,  including  a set  of  True/False  exercises  at  the  end  of 
most  sections. 

Earlier  Coverage  of  Eigenvalues  and  Eigenvectors  The  chapter  on  eigenvalues  and  eigenvectors,  which 
was  Chapter  7 in  the  previous  edition,  is  Chapter  5 in  this  edition. 

Complex  Vector  Spaces  The  chapter  entitled  Complex  Vector  Spaces  in  the  previous  edition  has  been 
completely  revised.  The  most  important  ideas  are  now  covered  in  Section  5.3  and  Section  7.5  in  the  context 
of  matrix  diagonalization.  A brief  review  of  complex  numbers  is  included  in  the  Appendix. 

Quadratic  Forms  This  material  has  been  extensively  rewritten  to  focus  more  precisely  on  the  most 
important  ideas. 

New  Chapter  on  Numerical  Methods  In  the  previous  edition  an  assortment  of  topics  appeared  in  the  last 
chapter.  That  chapter  has  been  replaced  by  a new  chapter  that  focuses  exclusively  on  numerical  methods  of 
linear  algebra.  We  achieved  this  by  moving  those  topics  not  concerned  with  numerical  methods  elsewhere 
in  the  text. 

Singular- Value  Decomposition  In  recognition  of  its  growing  importance,  a new  section  on  Singular-Value 
Decomposition  has  been  added  to  the  chapter  on  numerical  methods. 


Internet  Search  and  the  Power  Method  A new  section  on  the  Power  Method  and  its  application  to 
Internet  search  engines  has  been  added  to  the  chapter  on  numerical  methods. 

Applications  There  is  an  expanded  version  of  this  text  by  Howard  Anton  and  Chris  Rorres  entitled 

Elementary  Linear  Algebra:  Applications  Version , 10th  (ISBN  9780470432051),  whose  purpose  is  to 
supplement  this  version  with  an  extensive  body  of  applications.  However,  to  accommodate  instructors  who 
asked  us  to  include  some  applications  in  this  version  of  the  text,  we  have  done  so.  These  are  generally  less 
detailed  than  those  appearing  in  the  Anton/Rorres  text  and  can  be  omitted  without  loss  of  continuity. 


Hallmark  Features 


Relationships  Among  Concepts  One  of  our  main  pedagogical  goals  is  to  convey  to  the  student  that  linear 
algebra  is  a cohesive  subject  and  not  simply  a collection  of  isolated  definitions  and  techniques.  One  way  in 
which  we  do  this  is  by  using  a crescendo  of  Equivalent  Statements  theorems  that  continually  revisit 
relationships  among  systems  of  equations,  matrices,  determinants,  vectors,  linear  transformations,  and 
eigenvalues.  To  get  a general  sense  of  how  we  use  this  technique  see  Theorems  1.5.3,  1.6.4,  2.3.8,  4.8.10, 
4.10.4  and  then  Theorem  5.1.6,  for  example. 

Smooth  Transition  to  Abstraction  Because  the  transition  from  Rn  to  general  vector  spaces  is  difficult  for 
many  students,  considerable  effort  is  devoted  to  explaining  the  purpose  of  abstraction  and  helping  the 
student  to  “visualize”  abstract  ideas  by  drawing  analogies  to  familiar  geometric  ideas. 

Mathematical  Precision  When  reasonable,  we  try  to  be  mathematically  precise.  In  keeping  with  the  level 
of  student  audience,  proofs  are  presented  in  a patient  style  that  is  tailored  for  beginners.  There  is  a brief 
section  in  the  Appendix  on  how  to  read  proof  statements,  and  there  are  various  exercises  in  which  students 
are  guided  through  the  steps  of  a proof  and  asked  for  justification. 

Suitability  for  a Diverse  Audience  This  text  is  designed  to  serve  the  needs  of  students  in  engineering, 
computer  science,  biology,  physics,  business,  and  economics  as  well  as  those  majoring  in  mathematics. 

Historical  Notes  To  give  the  students  a sense  of  mathematical  history  and  to  convey  that  real  people 
created  the  mathematical  theorems  and  equations  they  are  studying,  we  have  included  numerous  Historical 
Notes  that  put  the  topic  being  studied  in  historical  perspective. 


About  the  Exercises 


Graded  Exercise  Sets  Each  exercise  set  begins  with  routine  drill  problems  and  progresses  to  problems 
with  more  substance. 

True/False  Exercises  Most  exercise  sets  end  with  a set  of  True/False  exercises  that  are  designed  to  check 
conceptual  understanding  and  logical  reasoning.  To  avoid  pure  guessing,  the  students  are  required  to  justify 
their  responses  in  some  way. 

Supplementary  Exercise  Sets  Most  chapters  end  with  a set  of  supplementary  exercises  that  tend  to  be 
more  challenging  and  force  the  student  to  draw  on  ideas  from  the  entire  chapter  rather  than  a specific 
section. 


Supplementary  Materials  for  Students 

Student  Solutions  Manual  This  supplement  provides  detailed  solutions  to  most  theoretical  exercises  and 
to  at  least  one  nonroutine  exercise  of  every  type  (ISBN  9780470458228). 

Technology  Exercises  and  Data  Files  The  technology  exercises  that  appeared  in  the  previous  edition  have 
been  moved  to  the  Web  site  that  accompanies  this  text.  Those  exercises  are  designed  to  be  solved  using 
MATLAB,  Mathematica,  or  Maple  and  are  accompanied  by  data  files  in  all  three  formats.  The  exercises  and 
data  can  be  downloaded  from  either  of  the  following  Web  sites. 

www.howardanton.com 

www.wilev.com/college/anton 


Supplementary  Materials  for  Instructors 

Instructor's  Solutions  Manual  This  supplement  provides  worked-out  solutions  to  most  exercises  in  the 
text  (ISBN  9780470458235). 

WileyPLUS™  This  is  Wiley's  proprietary  online  teaching  and  learning  environment  that  integrates  a 
digital  version  of  this  textbook  with  instructor  and  student  resources  to  fit  a variety  of  teaching  and  learning 
styles.  WileyPLUS  will  help  your  students  master  concepts  in  a rich  and  structured  environment  that  is 
available  to  them  24/7.  It  will  also  help  you  to  personalize  and  manage  your  course  more  effectively  with 
student  assessments,  assignments,  grade  tracking,  and  other  useful  tools. 

Your  students  will  receive  timely  access  to  resources  that  address  their  individual  needs  and  will 
receive  immediate  feedback  and  remediation  resources  when  needed. 

There  are  also  self-assessment  tools  that  are  linked  to  the  relevant  portions  of  the  text  that  will  enable 
your  students  to  take  control  of  their  own  learning  and  practice. 

WileyPLUS  will  help  you  to  identify  those  students  who  are  falling  behind  and  to  intervene  in  a 
timely  manner  without  waiting  for  scheduled  office  hours. 

More  information  about  WileyPLUS  can  be  obtained  from  your  Wiley  representative. 


A Guide  for  the  Instructor 

Although  linear  algebra  courses  vary  widely  in  content  and  philosophy,  most  courses  fall  into  two  categories 
— those  with  about  35-40  lectures  and  those  with  about  25-30  lectures.  Accordingly,  we  have  created  long 
and  short  templates  as  possible  starting  points  for  constructing  a course  outline.  Of  course,  these  are  just 
guides,  and  you  will  certainly  want  to  customize  them  to  fit  your  local  interests  and  requirements.  Neither  of 
these  sample  templates  includes  applications.  Those  can  be  added,  if  desired,  as  time  permits. 


Long  Template 

Short  Template 

Chapter  1:  Systems  of  Linear  Equations  and  Matrices 
Chapter  2:  Determinants 

7 lectures 
3 lectures 

6 lectures 
2 lectures 

Long  Template  Short  Template 


Chapter  3:  Euclidean  Vector  Spaces 

4 lectures 

3 lectures 

Chapter  4:  General  Vector  Spaces 

10  lectures 

10  lectures 

Chapter  5:  Eigenvalues  and  Eigenvectors 

3 lectures 

3 lectures 

Chapter  6:  Inner  Product  Spaces 

3 lectures 

1 lecture 

Chapter  7:  Diagonalization  and  Quadratic  Forms 

4 lectures 

3 lectures 

Chapter  8:  Linear  Transformations 

3 lectures 

2 lectures 

Total: 

37  lectures 

30  lectures 
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1 Systems  of  Linear 

Equations  and  Matrices 
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INTRODUCTION 

Information  in  science,  business,  and  mathematics  is  often  organized  into  rows  and 
columns  to  form  rectangular  arrays  called  “matrices”  (plural  of  “matrix”).  Matrices  often 
appear  as  tables  of  numerical  data  that  arise  from  physical  observations,  but  they  occur  in 
various  mathematical  contexts  as  well.  For  example,  we  will  see  in  this  chapter  that  all  of 
the  information  required  to  solve  a system  of  equations  such  as 

5x+y  = 3 

2x-y  = 4 


is  embodied  in  the  matrix 


5 1 3 

_2  -1  4_ 

and  that  the  solution  of  the  system  can  be  obtained  by  performing  appropriate  operations 
on  this  matrix.  This  is  particularly  important  in  developing  computer  programs  for  solving 
systems  of  equations  because  computers  are  well  suited  for  manipulating  arrays  of 
numerical  information.  However,  matrices  are  not  simply  a notational  tool  for  solving 
systems  of  equations;  they  can  be  viewed  as  mathematical  objects  in  their  own  right,  and 
there  is  a rich  and  important  theory  associated  with  them  that  has  a multitude  of  practical 
applications.  It  is  the  study  of  matrices  and  related  topics  that  forms  the  mathematical  field 
that  we  call  “linear  algebra.”  In  this  chapter  we  will  begin  our  study  of  matrices. 
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1.1  Introduction  to  Systems  of  Linear  Equations 

Systems  of  linear  equations  and  their  solutions  constitute  one  of  the  major  topics  that  we  will  study  in  this 
course.  In  this  first  section  we  will  introduce  some  basic  terminology  and  discuss  a method  for  solving  such 
systems. 


Linear  Equations 

Recall  that  in  two  dimensions  a line  in  a rectangular  xy-coordinate  system  can  be  represented  by  an  equation  of 
the  form 

ax  +by  =c  (a,  b not  both  0) 

and  in  three  dimensions  a plane  in  a rectangular  xyz-coordinate  system  can  be  represented  by  an  equation  of  the 
form 

ax  + by  + cz  = d (<s,  b,  c not  all  0) 

These  are  examples  of  “linear  equations,”  the  first  being  a linear  equation  in  the  variables  x andy  and  the  second 
a linear  equation  in  the  variables  x,  y,  and  z.  More  generally,  we  define  a linear  equation  in  the  n variables 
x \ , X2,  - - xn  to  be  one  that  can  be  expressed  in  the  form 

a\x\  +^2*2  + — b (1) 

where  a\,  ^2, an  and  b are  constants,  and  the  a's  are  not  all  zero.  In  the  special  cases  where  n = 2 or  ^ = 3, 
we  will  often  use  variables  without  subscripts  and  write  linear  equations  as 

a\x  ^ a^y  = b (a\,  not  both  0)  (2) 

a\x + azy + ajz  = b ( a\ , a 2,  ^3  not  all  0)  (3) 

In  the  special  case  where  £ = Q,  Equation  1 has  the  form 

<*1*1  +^2*2  + + = 0 (4) 

which  is  called  a homogeneous  linear  equation  in  the  variables  x\,  X2, , Xyj. 

EXAMPLE  1 Linear  Equations 

Observe  that  a linear  equation  does  not  involve  any  products  or  roots  of  variables.  All  variables 
occur  only  to  the  first  power  and  do  not  appear,  for  example,  as  arguments  of  trigonometric, 
logarithmic,  or  exponential  functions.  The  following  are  linear  equations: 

x + 3y  = 7 x 1 — 2x2  “ 3x3  + *4  = 0 

^-x  — y + 3z  = — 1 xi +X2 +...  + x„  = 1 
The  following  are  not  linear  equations: 


x + 3y2  = 4 3x  4-  2y  - xy  = 5 
sin  x +7  = 0 {x\‘¥2x2-¥x'i  = ^ 


A finite  set  of  linear  equations  is  called  a system  of  linear  equations  or,  more  briefly,  a linear  system.  The 
variables  are  called  unknowns . For  example,  system  5 that  follows  has  unknowns  x andy,  and  system  6 has 
unknowns  x l ,x  2 , and  x 3 . 


5x+.y  = 3 4xi  — *2  + 3*3  = - 1 (5) 

2x-y  = 4 3xi  +*2  + 9*3=  “4  (6) 


The  double  subscripting  on  the  coefficients  aij 
of  the  unknowns  gives  their  location  in  the 
system — the  first  subscript  indicates  the  equation 
in  which  the  coefficient  occurs,  and  the  second 
indicates  which  unknown  it  multplies.  Thus,  a 12 
is  in  the  first  equation  and  multiplies  x2. 


A general  linear  system  of  m equations  in  the  n unknowns  x\,  x2, , x^  can  be  written  as 

fllixi  +.312X2  + .~  + 01mX„  = b\ 

<*21*1  +a22X2  + ...  + a2f7xyi  = i>2  (7) 

amlxl  +<3m2;,‘2  + ---  + <*mnxn  = ^m 

A solution  of  a linear  system  in  n unknowns  x\,  *2,  is  a sequence  of  n numbers  s\,  s2 , xn  for  which 

the  substitution 

xi  =sU  x2=s2,...,  x„  = s„ 

makes  each  equation  a true  statement.  For  example,  the  system  in  5 has  the  solution 

*=  1,  7=  “2 

and  the  system  in  6 has  the  solution 

XI  = 1,  *2  = 2,  X3=  — 1 
These  solutions  can  be  written  more  succinctly  as 

(1,  -2)  and  (1,2,  - 1) 

in  which  the  names  of  the  variables  are  omitted.  This  notation  allows  us  to  interpret  these  solutions  geometrically 
as  points  in  two-dimensional  and  three-dimensional  space.  More  generally,  a solution 

xi  =s\,  x2  = s2,...,  x„  = s„ 
of  a linear  system  in  n unknowns  can  be  written  as 

(sl>  s2>--->sn) 

which  is  called  an  ordered  n-tuple.  With  this  notation  it  is  understood  that  all  variables  appear  in  the  same  order 


in  each  equation.  If  ^ = 2,  then  the  ?z -tuple  is  called  an  ordered  pair , and  if  n = 3,  then  it  is  called  an  ordered 
triple. 


Linear  Systems  with  Two  and  Three  Unknowns 


Linear  systems  in  two  unknowns  arise  in  connection  with  intersections  of  lines.  For  example,  consider  the  linear 
system 

ct\x  + b\y  = c\ 
a2x  + b2y  = C2 

in  which  the  graphs  of  the  equations  are  lines  in  the  xy-plane.  Each  solution  (x,  y)  of  this  system  corresponds  to  a 
point  of  intersection  of  the  lines,  so  there  are  three  possibilities  (Figure  1.1.1): 

The  lines  may  be  parallel  and  distinct,  in  which  case  there  is  no  intersection  and  consequently  no  solution. 
The  lines  may  intersect  at  only  one  point,  in  which  case  the  system  has  exactly  one  solution. 

The  lines  may  coincide,  in  which  case  there  are  infinitely  many  points  of  intersection  (the  points  on  the 
common  line)  and  consequently  infinitely  many  solutions. 


k? 


No  solution 


One  solution 


Infinitely  many 
solutions 
(coincident  lines) 


Figure  1.1.1 

In  general,  we  say  that  a linear  system  is  consistent  if  it  has  at  least  one  solution  and  inconsistent  if  it  has  no 
solutions.  Thus,  a consistent  linear  system  of  two  equations  in  two  unknowns  has  either  one  solution  or  infinitely 
many  solutions — there  are  no  other  possibilities.  The  same  is  true  for  a linear  system  of  three  equations  in  three 
unknowns 

a\x  + b\y  +c\z  = d\ 
a 2XJt-bzy±C2Z  = d2 
aix  + b-yy  + cjz  = <3? 3 

in  which  the  graphs  of  the  equations  are  planes.  The  solutions  of  the  system,  if  any,  correspond  to  points  where 
all  three  planes  intersect,  so  again  we  see  that  there  are  only  three  possibilities — no  solutions,  one  solution,  or 
infinitely  many  solutions  (Figure  1.1.2). 


1 

No  solutions 

No  solutions 

(three  parallel  planes; 

(two  parallel  planes; 

no  common  intersection) 

no  common  intersection) 

No  solutions 
(two  coincident  planes 
parallel  to  the  third; 
no  common  intersection) 


One  solution 
(intersection  is  a point) 


Infinitely  many  solutions 
(planes  are  all  coincident; 
intersection  is  a plane) 


Figure  1.1.2 


We  will  prove  later  that  our  observations  about  the  number  of  solutions  of  linear  systems  of  two  equations  in  two 
unknowns  and  linear  systems  of  three  equations  in  three  unknowns  actually  hold  for  all  linear  systems.  That  is: 


Every  system  of  linear  equations  has  zero , one,  or  infinitely  many  solutions.  There  are  no  other 
possibilities. 


EXAMPLE  2 A Linear  System  with  One  Solution 


Solve  the  linear  system 


x—y=  1 
2x  + y = 6 


We  can  eliminate  x from  the  second  equation  by  adding  -2  times  the  first  equation  to 
the  second.  This  yields  the  simplified  system 

x -y  = 1 


3y  =4 

From  the  second  equation  we  obtain  y = ~,  and  on  substituting  this  value  in  the  first  equation  we 
n 

obtain  x = 1 +y  = —.  Thus,  the  system  has  the  unique  solution 


X = 


7 

y 


Geometrically,  this  means  that  the  lines  represented  by  the  equations  in  the  system  intersect  at  the 
single  point  | y J.  We  leave  it  for  you  to  check  this  by  graphing  the  lines. 


EXAMPLE  3 A Linear  System  with  No  Solutions 


Solve  the  linear  system 


x+y  = 4 
3x  + 3 y = 6 


We  can  eliminate  x from  the  second  equation  by  adding  -3  times  the  first  equation  to 
the  second  equation.  This  yields  the  simplified  system 

*+7  = 4 
0=  -6 

The  second  equation  is  contradictory,  so  the  given  system  has  no  solution.  Geometrically,  this 
means  that  the  lines  corresponding  to  the  equations  in  the  original  system  are  parallel  and  distinct. 
We  leave  it  for  you  to  check  this  by  graphing  the  lines  or  by  showing  that  they  have  the  same  slope 
but  different  y-intercepts. 


EXAMPLE  4 A Linear  System  with  Infinitely  Many  Solutions 

Solve  the  linear  system 

4x  - 2y  = 1 

1 6x  * 8y  = 4 


In  Example  4 we  could  have  also  obtained 
parametric  equations  for  the  solutions  by 
solving  8 for  y in  terms  of  x,  and  letting 
x = t be  the  parameter.  The  resulting 
parametric  equations  would  look  different 
but  would  define  the  same  solution  set. 


We  can  eliminate  x from  the  second  equation  by  adding  -4  times  the  first  equation  to 
the  second.  This  yields  the  simplified  system 

4x  — 2y  = 1 
0 = 0 

The  second  equation  does  not  impose  any  restrictions  on  x andy  and  hence  can  be  omitted.  Thus, 
the  solutions  of  the  system  are  those  values  of  x and  y that  satisfy  the  single  equation 


4x  — 2y  = \ 


(8) 


Geometrically,  this  means  the  lines  corresponding  to  the  two  equations  in  the  original  system 
coincide.  One  way  to  describe  the  solution  set  is  to  solve  this  equation  for  x in  terms  of  y to  obtain 
x = ~ + -J- y and  then  assign  an  arbitrary  value  t (called  a parameter)  to  y.  This  allows  us  to 

express  the  solution  by  the  pair  of  equations  (called parametric  equations) 

* = 4 + 2‘-  y=t 


We  can  obtain  specific  numerical  solutions  from  these  equations  by  substituting  numerical  values 
for  the  parameter.  For  example,  t = Q yields  the  solution  oj,  t = 1 yields  the  solution  \ ^f  1 J, 

and  t = — 1 yields  the  solution  — 1 j.  You  can  confirm  that  these  are  solutions  by 

substituting  the  coordinates  into  the  given  equations. 


EXAMPLE  5 A Linear  System  with  Infinitely  Many  Solutions 

Solve  the  linear  system 

x —y  + 2 z = 5 

2x  — 2y  + 4z  = 10 

3x  — 3y  + 6z  = 15 

This  system  can  be  solved  by  inspection,  since  the  second  and  third  equations  are 
multiples  of  the  first.  Geometrically,  this  means  that  the  three  planes  coincide  and  that  those  values 
of  x,  y,  and  z that  satisfy  the  equation 


x-y  + 2z  = 5 (9) 

automatically  satisfy  all  three  equations.  Thus,  it  suffices  to  find  the  solutions  of  9.  We  can  do  this 
by  first  solving  9 for  x in  terms  of  y and  z,  then  assigning  arbitrary  values  r and  s (parameters)  to 
these  two  variables,  and  then  expressing  the  solution  by  the  three  parametric  equations 

x = 5 4-r  — 2s,  y =r,  z = s 

Specific  solutions  can  be  obtained  by  choosing  numerical  values  for  the  parameters  r and  5.  For 
example,  taking  r = \ and  s = 0 yields  the  solution  (6,  1,0). 


Augmented  Matrices  and  Elementary  Row  Operations 

As  the  number  of  equations  and  unknowns  in  a linear  system  increases,  so  does  the  complexity  of  the  algebra 
involved  in  finding  solutions.  The  required  computations  can  be  made  more  manageable  by  simplifying  notation 
and  standardizing  procedures.  For  example,  by  mentally  keeping  track  of  the  location  of  the  +?s,  the  xfs,  and  the 
- s in  the  linear  system 


a ii*i 

+ 

*12*2 

+ • ' 

. . q= 

alnxn 

= h 

*21*1 

+ 

<*22*2 

+ ' ' 

. . q= 

a2  YlXYl 

= h 

«ml*l 

+ 

*m2*2 

+ ' ' 

. . q= 

= bm 

we  can  abbreviate  the  system  by  writing  only  the  rectangular  array  of  numbers 


an 

a\2  • 1 

a\ n 

b l 

*21 

<222  ’ ' 

* 2 

am  1 

am2 

amn 

bm 

As  noted  in  the  introduction  to  this  chapter,  the 
term  “matrix”  is  used  in  mathematics  to  denote  a 
rectangular  array  of  numbers.  In  a later  section 
we  will  study  matrices  in  detail,  but  for  now  we 
will  only  be  concerned  with  augmented  matrices 
for  linear  systems. 

This  is  called  the  augmented  matrix  for  the  system.  For  example, 
equations 

*1  +*2  + 2^3  = 9 [l 
2xi +4x2  “ 3x3  = 1 1S  2 
3xi  + 6x2  “ 5x3  = 0 


the  augmented  matrix  for  the  system  of 

1 2 9“ 

4 “3  1 
6—5  0 


The  basic  method  for  solving  a linear  system  is  to  perform  appropriate  algebraic  operations  on  the  system  that  do 
not  alter  the  solution  set  and  that  produce  a succession  of  increasingly  simpler  systems,  until  a point  is  reached 
where  it  can  be  ascertained  whether  the  system  is  consistent,  and  if  so,  what  its  solutions  are.  Typically,  the 
algebraic  operations  are  as  follows: 

Multiply  an  equation  through  by  a nonzero  constant. 

Interchange  two  equations. 

Add  a constant  times  one  equation  to  another. 

Since  the  rows  (horizontal  lines)  of  an  augmented  matrix  correspond  to  the  equations  in  the  associated  system, 
these  three  operations  correspond  to  the  following  operations  on  the  rows  of  the  augmented  matrix: 

Multiply  a row  through  by  a nonzero  constant. 

Interchange  two  rows. 

Add  a constant  times  one  row  to  another. 

These  are  called  elementary  row  operations  on  a matrix. 

In  the  following  example  we  will  illustrate  how  to  use  elementary  row  operations  and  an  augmented  matrix  to 
solve  a linear  system  in  three  unknowns.  Since  a systematic  procedure  for  solving  linear  systems  will  be 
developed  in  the  next  section,  do  not  worry  about  how  the  steps  in  the  example  were  chosen.  Your  objective  here 
should  be  simply  to  understand  the  computations. 


EXAMPLE  6 Using  Elementary  Row  Operations 


In  the  left  column  we  solve  a system  of  linear  equations  by  operating  on  the  equations  in  the 
system,  and  in  the  right  column  we  solve  the  same  system  by  operating  on  the  rows  of  the 
augmented  matrix. 


x±y  + 2z  = 9 

2x  + Ay  - 3z  = 1 
3x  + 6y  — 5z  = 0 


11  2 9 

2 4-31 

3 6-50 


Add  -2  times  the  first  equation  to  the  second 
to  obtain 

x+y  + 2z  = 9 

2y  — Iz  = -17 

3x  + 6y  — 5z  — 0 


Add  -3  times  the  first  equation  to  the  third  to 
obtain 

x+y  + 2z  = 9 

2y-lz  = -17 

3y-Uz  = -27 


Multiply  the  second  equation  by  ~ to  obtain 


x +y  + 2 z = 9 


3y-Uz  = -27 


Add  -3  times  the  second  equation  to  the  third 
to  obtain 

x+y  + 2z  = 9 


Multiply  the  third  equation  by  -2  to  obtain 
x 4 -y  + 2z  = 9 


z = 3 


Add  -2  times  the  first  row  to  the  second 
to  obtain 

'112  9' 

0 2-7  -17 
3 6-5  0 


Add  -3  times  the  first  row  to  the  third  to 
obtain 

fl  1 2 9' 

0 2-7  -17 
0 3 -11  -27 


Multiply  the  second  row  by  to  obtain 


1 1 2 

0 1 -l 

0 3 -11 


9 

" 2 

-27 


Add  -3  times  the  second  row  to  the  third 
to  obtain 


1 1 
0 1 

0 0 


9 

!Z 

2 

3 

'2 


Multiply  the  third  row  by  -2  to  obtain 
"112  9' 

0 1 -I  -f 

0 0 1 3 


Add  -1  times  the  second  equation  to  the  first  Add  -1  times  the  second  row  to  the  first 
to  obtain  to  obtain 


+ 1IZ  = 25 

+ 2 2 

y-lz  = -il 

y 2 2 


Z — 


'»  f ¥ 

oi-?-4?- 


0 0 


Add  times  the  third  equation  to  the  first 


11 


Add  — 77-  times  the  third  row  to  the  first 

2 


and  times  the  third  equation  to  the  second  to  and  times  the  third  row  to  the  second 


2 

obtain 


y = 

z = 


1 

2 

3 


to  obtain 

1 0 0 
0 1 0 
0 0 1 


The  solution  x = 1,  y = 2,  z=  3 is  now  evident. 


Maxime  Bocher  (1867-1918) 

The  first  known  use  of  augmented  matrices  appeared  between  200  B.C. 
and  100  B.C.  in  a Chinese  manuscript  entitled  Arne  Chapters  of  Mathematical  Art.  The 
coefficients  were  arranged  in  columns  rather  than  in  rows,  as  today,  but  remarkably  the 
system  was  solved  by  performing  a succession  of  operations  on  the  columns.  The  actual 
use  of  the  term  augmented  matrix  appears  to  have  been  introduced  by  the  American 
mathematician  Maxime  Bocher  in  his  book  Introduction  to  Higher  Algebra,  published  in 
1907.  In  addition  to  being  an  outstanding  research  mathematician  and  an  expert  in  Latin, 
chemistry,  philosophy,  zoology,  geography,  meteorology,  art,  and  music,  Bocher  was  an 
outstanding  expositor  of  mathematics  whose  elementary  textbooks  were  greatly 
appreciated  by  students  and  are  still  in  demand  today. 

[Image:  Courtesy  of  the  American  Mathematical  Society] 


Concept  Review 

Linear  equation 
Homogeneous  linear  equation 
System  of  linear  equations 
Solution  of  a linear  system 
Ordered  ^z-tuple 
Consistent  linear  system 
Inconsistent  linear  system 
Parameter 

Parametric  equations 
Augmented  matrix 


Determine  whether  a given  equation  is  linear. 

Determine  whether  a given  ^z-tuple  is  a solution  of  a linear  system. 

Find  the  augmented  matrix  of  a linear  system. 

Find  the  linear  system  corresponding  to  a given  augmented  matrix. 

Perform  elementary  row  operations  on  a linear  system  and  on  its  corresponding  augmented  matrix. 
Determine  whether  a linear  system  is  consistent  or  inconsistent. 

Find  the  set  of  solutions  to  a consistent  linear  system. 


1.  In  each  part,  determine  whether  the  equation  is  linear  in  x \ , *2,  and  *3. 


Answer: 

(a),  (c),  and  (f)  are  linear  equations;  (b),  (d)  and  (e)  are  not  linear  equations 
2.  In  each  part,  determine  whether  the  equations  form  a linear  system. 


Elemenetary  row  operations 

Skills 


Exercise  Set  1.1 


(b)  XI  + 3*2 + *1*3  = 2 

(c)  *1  = -7x2 + 3x3 

(d)  xf2  + X2  + 8x3  = 5 

(e)  xj/5  - 2x2  + X3  = 4 


(a)  — 2x  + 4y  +z  = 2 


(b)  x = 4 
2x  = 8 

(c)  4x  — y + 2z=  — 1 
—x  + (In  2)y  — 3z=  0 

(d)  3z  + x = — 4 

y + 5z=  1 
6x  + 2z  = 3 
-x-7-z=  4 

3.  In  each  part,  determine  whether  the  equations  form  a linear  system. 

(a)  2xi  - x4  = 5 

— x i + 5x2  + 3*3  — 2*4  = — 1 

(b)  sin(2^i  +*3)  = {5 

e2x2~ 2*4  _ X 

x2 

4xa  = 4 

(c)  - X2+  2x2  = 0 

2xi  + x2  —X2X4  = 3 

— xi + 5x2—  X4  = — 1 

(d)  *1  +x2  = x2  + x4 

Answer: 

(a)  and  (d)  are  linear  systems;  (b)  and  (c)  are  not  linear  systems 

4.  For  each  system  in  Exercise  2 that  is  linear,  determine  whether  it  is  consistent. 

5.  For  each  system  in  Exercise  3 that  is  linear,  determine  whether  it  is  consistent. 

Answer: 

(a)  and  (d)  are  both  consistent 

6.  Write  a system  of  linear  equations  consisting  of  three  equations  in  three  unknowns  with 

(a)  no  solutions. 

(b)  exactly  one  solution. 

(c)  infinitely  many  solutions. 

7.  In  each  part,  determine  whether  the  given  vector  is  a solution  of  the  linear  system 

2xi  — 4x2  — *3  = 1 
xi  — 3x2  + *3=1 
3xi  — 5x2  — 3x3  = 1 


(a)  (3,  1,  1) 

(b)  (3, -1,1) 


(c)  (13,5,2) 

(e)  (17,7,5) 

Answer: 

(a),  (d),  and  (e)  are  solutions;  (b)  and  (c)  are  not  solutions 

8.  In  each  part,  determine  whether  the  given  vector  is  a solution  of  the  linear  system 

*1  + 2*2  — 2*3  = 3 

3*i  -X2  + X3  = 1 

—xi  + 5x2  — 5x3  = 5 

w (§•!■’) 

<b)(f  §,o) 

(c)  (5,  8,  1) 

(d)  (1  10  2) 

^7’  7 ’ 1) 

9.  In  each  part,  find  the  solution  set  of  the  linear  equation  by  using  parameters  as  necessary. 

(a)  7x  — 5y  = 3 

(b)  -8xi  + 2x2  - 5*3  + 6x4  = 1 

Answer: 


« x = !<+f 

X2  = r 
X2  = s 
X4  = i 

10.  In  each  part,  find  the  solution  set  of  the  linear  equation  by  using  parameters  as  necessary. 

(a)  3xi  -5x2 + 4x3  = 7 

(b)  3v  — 8w  + 2x  — y + Az  = 0 

11.  In  each  part,  find  a system  of  linear  equations  corresponding  to  the  given  augmented  matrix 


(a) 


2 0 0 
3-4  0 
0 1 1 


(b) 

'3 

0 

-2 

5 

7 

1 

4 

-3 

0 

- 

2 

1 

7 

(c) 

'7 

2 

1 

-3 

5' 

1 

2 

4 

0 

1_ 

(d) 

1 

0 

0 

0 

7" 

0 

1 

0 

0 - 

-2 

0 

0 

1 

0 

3 

0 

0 

0 

1 

4 

Answer: 


(a)  2xi  = 0 

3*1  — 4^2  = 0 


*2 

= 

1 

(b) 

3xi 

— 

2x3 

7xi 

+ 

*2 

+ 

4x3 

-2x2 

+ 

*3 

(c) 

7xi 

+ 

2X2 

+ 

*3 

*1 

+ 

2x2 

+ 

4x3 

(d) 

*1 

= 

7 

*2 

= 

-2 

x3  =3 
*4  = 4 


5 

-3 

7 


3x4  = 


5 

1 


12.  In  each  part,  find  a system  of  linear  equations  corresponding  to  the  given  augmented  matrix. 


(a) 

2 

-f 

-4 

-6 

1 

-1 

3 

0 

(b) 

'0  3 

-1 

-1  -1 

_5  2 

0 

-3  -6 

(c) 

1 

2 

3 4" 

-4 

-3 

-2  -1 

5 

-6 

1 1 

-8 

0 

0 3 

(d) 

3 

0 1 

-4  3 

-4 

0 4 

1 -3 

-1 

3 0 

-2  -9 

0 

0 0 

-1  -2 

13.  In  each  part,  find  the  augmented  matrix  for  the  given  system  of  linear  equations. 


(a)  -2xi  = 6 

3xi  = 8 

9xi=  - 3 

(b)  6xi  -X2  + 3x3=4 

5x2 -X3  = 1 

(c)  2x2  -3x4+  *5 

- 3xi  - *2  + *3 

6xi  + 2x2  — *3  + 2x4  “ 3x5 

(d)  xi  -x5  = 7 
Answer: 

(a)  —2  6 

3 8 

9 — 3_ 

(b)  "6  -1  3 4" 

.0  5 -1  1_ 

(c)  f 0 2 0 -3  1 O' 

-3-1  1 0 0-1 

6 2-1  2-3  6 

(d)  [1  0 0 0 -1  7] 

14.  In  each  part,  find  the  augmented  matrix  for  the  given  system  of  linear  equations. 

(a)  3xi  -2x2=  - 1 

4xi  + 5x2  = 3 

7xi  + 3x2  = 2 

(b)  2xi  +2x3=1 

3xi  — *2  + 4x3  = 7 
6xi +X2—  *3  = 0 

(c)  xi  +2x2  -X4  + X5=l 

3x2  + *3  — X5  = 2 

X3  + 7X4  =1 

(d)  *1  = 1 

X2  =2 
*3  = 3 

15.  The  curve  y = ax  + bx  4=  c shown  in  the  accompanying  figure  passes  through  the  points 

(*1,  y±)r  (*2,  y 2),  and  (*3,  73)-  Show  that  the  coefficients  a , b , and  c are  a solution  of  the  system  of 
linear  equations  whose  augmented  matrix  is 

*1  *1  1 71 

*2  *2  1 72 
*3  *3  1 73 


= 0 

= 6 


>•  = ax 2 + bx  + c 


n>’ 

(*3*  >3> 
(Xj,  > 2) 

X 

► 


Figure  Ex-15 

16.  Explain  why  each  of  the  three  elementary  row  operations  does  not  affect  the  solution  set  of  a linear  system. 

17.  Show  that  if  the  linear  equations 

x\  + kx 2 = c and  x\  + lx2  = d 

have  the  same  solution  set,  then  the  two  equations  are  identical  (i.e.,  k = \ and  c = d )• 

True-False  Exercises 


In  parts  (a)-(h)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  A linear  system  whose  equations  are  all  homogeneous  must  be  consistent. 

Answer: 

True 

(b)  Multiplying  a linear  equation  through  by  zero  is  an  acceptable  elementary  row  operation. 

Answer: 

False 

(c)  The  linear  system 

*-7  = 3 
2x  — 2y  = k 

cannot  have  a unique  solution,  regardless  of  the  value  of  k . 

Answer: 

True 

(d)  A single  linear  equation  with  two  or  more  unknowns  must  always  have  infinitely  many  solutions. 
Answer: 

True 

(e)  If  the  number  of  equations  in  a linear  system  exceeds  the  number  of  unknowns,  then  the  system  must  be 
inconsistent. 

Answer: 


False 


(f)  If  each  equation  in  a consistent  linear  system  is  multiplied  through  by  a constant  c,  then  all  solutions  to  the 
new  system  can  be  obtained  by  multiplying  solutions  from  the  original  system  by  c. 

Answer: 

False 

(g)  Elementary  row  operations  permit  one  equation  in  a linear  system  to  be  subtracted  from  another. 

Answer: 

True 

(h)  The  linear  system  with  corresponding  augmented  matrix 

2 -1 

0 0 

is  consistent. 

Answer: 

False 
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1.2  Gaussian  Elimination 

In  this  section  we  will  develop  a systematic  procedure  for  solving  systems  of  linear  equations.  The  procedure  is  based  on 
the  idea  of  performing  certain  operations  on  the  rows  of  the  augmented  matrix  for  the  system  that  simplifies  it  to  a form 
from  which  the  solution  of  the  system  can  be  ascertained  by  inspection. 


Considerations  in  Solving  Linear  Systems 

When  considering  methods  for  solving  systems  of  linear  equations,  it  is  important  to  distinguish  between  large  systems 
that  must  be  solved  by  computer  and  small  systems  that  can  be  solved  by  hand.  For  example,  there  are  many  applications 
that  lead  to  linear  systems  in  thousands  or  even  millions  of  unknowns.  Large  systems  require  special  techniques  to  deal 
with  issues  of  memory  size,  roundoff  errors,  solution  time,  and  so  forth.  Such  techniques  are  studied  in  the  field  of 
numerical  analysis  and  will  only  be  touched  on  in  this  text.  However,  almost  all  of  the  methods  that  are  used  for  large 
systems  are  based  on  the  ideas  that  we  will  develop  in  this  section. 


Echelon  Forms 


In  Example  6 of  the  last  section,  we  solved  a linear  system  in  the  unknowns  x,  y,  and  z by  reducing  the  augmented  matrix 
to  the  form 

'1  0 0 r 
0 10  2 
0 0 13 

from  which  the  solution  x = hy  = 2,z  = 3 became  evident.  This  is  an  example  of  a matrix  that  is  in  reduced  row 
echelon  form.  To  be  of  this  form,  a matrix  must  have  the  following  properties: 

If  a row  does  not  consist  entirely  of  zeros,  then  the  first  nonzero  number  in  the  row  is  a 1 . We  call  this  a leading  1. 

If  there  are  any  rows  that  consist  entirely  of  zeros,  then  they  are  grouped  together  at  the  bottom  of  the  matrix. 

In  any  two  successive  rows  that  do  not  consist  entirely  of  zeros,  the  leading  1 in  the  lower  row  occurs  farther  to  the 
right  than  the  leading  1 in  the  higher  row. 

Each  column  that  contains  a leading  1 has  zeros  everywhere  else  in  that  column. 

A matrix  that  has  the  first  three  properties  is  said  to  be  in  row  echelon  form.  (Thus,  a matrix  in  reduced  row  echelon 
form  is  of  necessity  in  row  echelon  form,  but  not  conversely.) 


EXAMPLE  1 Row  Echelon  and  Reduced  Row  Echelon  Form 


The  following  matrices  are  in  reduced  row  echelon  form. 

0 0 
0 0 


The  following  matrices  are  in  row  echelon  form  but  not  reduced  row  echelon  form. 


"1 

4 

-3 

7' 

"1 

1 

o' 

'0 

1 

2 

6 

O' 

0 

1 

6 

2 

, 

0 

1 

0 

, 

0 

0 

1 

-1 

0 

0 

0 

1 

5 

0 

0 

0 

0 

0 

0 

0 

1 

10  0 4 

0 10  7 

0 0 1-1 


■I 

'1 

0 

o' 

, 

0 

1 

0 

, 

- 

0 

0 

1 

01-201 
0 0 0 1 3 

0 0 0 0 0 

0 0 0 0 0 


EXAMPLE  2 More  on  Row  Echelon  and  Reduced  Row  Echelon  Form 


As  Example  1 illustrates,  a matrix  in  row  echelon  form  has  zeros  below  each  leading  1 , whereas  a matrix  in 
reduced  row  echelon  form  has  zeros  below  and  above  each  leading  1 . Thus,  with  any  real  numbers  substituted  for 
the  *'s,  all  matrices  of  the  following  types  are  in  row  echelon  form: 


'1 

* 

* 

*" 

'l 

* 

* 

'1 

* 

* 

* 

0 

1 

* 

* 

0 

1 

* 

* 

0 

1 

* 

* 

0 

0 

1 

* 

7 

0 

0 

1 

* 

7 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

* 

* 

* 

* 

* 

* 

* 

* 

0 

0 

0 

1 

* 

* 

* 

* 

* 

* 

0 

0 

0 

0 

1 

* 

* 

* 

* 

* 

0 

0 

0 

0 

0 

1 

* 

* 

* 

* 

0 

0 

0 

0 

0 

0 

0 

0 

1 

* 

All  matrices  of  the  following  types  are  in  reduced  row  echelon  form: 


1 

0 

0 

0 

1 

0 

0 

X 

1 

0 

# 

0 10  0 

# 

0 

0 

0 1 * * 

0 0 10 

0 0 1 * 

7 

0 0 0 0 

0 0 0 1 

0 0 0 0 

0 0 0 0 

0 1 
0 0 
0 0 
0 0 
0 0 


* 0 
0 1 
0 0 
0 0 
0 0 


0 0 
0 0 
1 0 
0 1 
0 0 


* * 
* * 
* * 
* * 

0 0 


0 * 
0 * 
0 * 
0 * 
1 * 


If,  by  a sequence  of  elementary  row  operations,  the  augmented  matrix  for  a system  of  linear  equations  is  put  in  reduced 
row  echelon  form,  then  the  solution  set  can  be  obtained  either  by  inspection  or  by  converting  certain  linear  equations  to 
parametric  form.  Here  are  some  examples. 

In  Example  3 we  could,  if  desired,  express  the 
solution  more  succinctly  as  the  4-tuple  (3,  -1,  0,  5). 


EXAMPLE  3 Unique  Solution 

Suppose  that  the  augmented  matrix  for  a linear  system  in  the  unknowns  x\,  X2,  *3,  and  X4  has  been  reduced 
by  elementary  row  operations  to 

'1  0 0 0 3' 

0100-1 

0 0 10  0 

0 0 0 1 5 

This  matrix  is  in  reduced  row  echelon  form  and  corresponds  to  the  equations 

xi  =3 

*2  = “I 

*3  =0 

X4  = 5 

Thus,  the  system  has  a unique  solution,  namely,  xj  = 3,  xj  = — 1,  X3  = 0,  *4=  5. 


EXAMPLE  4 Linear  Systems  in  Three  Unknowns 


In  each  part,  suppose  that  the  augmented  matrix  for  a linear  system  in  the  unknowns  x,  y,  and  z has  been 
reduced  by  elementary  row  operations  to  the  given  reduced  row  echelon  form.  Solve  the  system. 


l 

O 

O 

o 

"10  3 -f 

"1  -5  1 4" 

(a) 

0 12  0 
0 0 0 1 

(b) 

0 1-4  2 

0 0 0 0 

(c) 

0 0 0 0 
0 0 0 0 

Solution 

The  equation  that  corresponds  to  the  last  row  of  the  augmented  matrix  is 

Ox  + Oy  + Oz  = 1 

Since  this  equation  is  not  satisfied  by  any  values  of  x,  y,  and  z,  the  system  is  inconsistent. 

(b)  The  equation  that  corresponds  to  the  last  row  of  the  augmented  matrix  is 

Ox  + Oy  + Oz  = 0 

This  equation  can  be  omitted  since  it  imposes  no  restrictions  on  x,  y,  and  z;  hence,  the  linear  system 
corresponding  to  the  augmented  matrix  is 

x +3z  = —1 

y - 4z  = 2 

Since  x and  y correspond  to  the  leading  l's  in  the  augmented  matrix,  we  call  these  the  leading 
variables.  The  remaining  variables  (in  this  case  z)  are  called  free  variables.  Solving  for  the  leading 
variables  in  terms  of  the  free  variables  gives 

x = — 1 — 3z 
y = 2 4-  4z 

From  these  equations  we  see  that  the  free  variable  z can  be  treated  as  a parameter  and  assigned  an 
arbitrary  value,  t,  which  then  determines  values  for  x and  y.  Thus,  the  solution  set  can  be  represented 
by  the  parametric  equations 

x = — 1 — 3t,  7 = 2 + 4*,  z = t 

By  substituting  various  values  for  t in  these  equations  we  can  obtain  various  solutions  of  the  system. 
For  example,  setting  t = Q yields  the  solution 

x=  - 1,  7 = 2,  z = 0 

and  setting  t — \ yields  the  solution 

x = — 4,  7 = 6,  z=l 

(c)  As  explained  in  part  (b),  we  can  omit  the  equations  corresponding  to  the  zero  rows,  in  which  case  the 
linear  system  associated  with  the  augmented  matrix  consists  of  the  single  equation 

x-5y+z  = 4 (1) 

from  which  we  see  that  the  solution  set  is  a plane  in  three-dimensional  space.  Although  1 is  a valid 
form  of  the  solution  set,  there  are  many  applications  in  which  it  is  preferable  to  express  the  solution 
set  in  parametric  form.  We  can  convert  1 to  parametric  form  by  solving  for  the  leading  variable  x in 
terms  of  the  free  variables  y and  z to  obtain 

x =4  + 5y  — z 

From  this  equation  we  see  that  the  free  variables  can  be  assigned  arbitrary  values,  say  y = s and  z — U 
which  then  determine  the  value  of  x.  Thus,  the  solution  set  can  be  expressed  parametrically  as 


x = 4-\-5 s — t,  y = s,  z = t 


(2) 


We  will  usually  denote  parameters  in  a 
general  solution  by  the  letters  r,s,t9...9  but 
any  letters  that  do  not  conflict  with  the  names 
of  the  unknowns  can  be  used.  For  systems 
with  more  than  three  unknowns,  subscripted 
letters  such  as  t\9 12,  are  convenient. 


Formulas,  such  as  2,  that  express  the  solution  set  of  a linear  system  parametrically  have  some  associated  terminology. 


DEFINITION  1 

If  a linear  system  has  infinitely  many  solutions,  then  a set  of  parametric  equations  from  which  all  solutions  can 
be  obtained  by  assigning  numerial  values  to  the  parameters  is  called  a general  solution  of  the  system. 


Elimination  Methods 

We  have  just  seen  how  easy  it  is  to  solve  a system  of  linear  equations  once  its  augmented  matrix  is  in  reduced  row 
echelon  form.  Now  we  will  give  a step-by-step  elimination  procedure  that  can  be  used  to  reduce  any  matrix  to  reduced 
row  echelon  form.  As  we  state  each  step  in  the  procedure,  we  illustrate  the  idea  by  reducing  the  following  matrix  to 
reduced  row  echelon  form. 


0 

0 

-2 

0 

7 

12 

2 

4 

-10 

6 

12 

28 

2 

4 

-5 

6 

-5  -1 

Step  1.  Locate  the  leftmost  column  that  does  not  consist  entirely  of  zeros. 

”o 

0 -2 

0 7 

12 

2 

4 -10 

6 12 

28 

2 

4 -5 

6 -5 

-1 

L_ 

Leftmost  nonzero  column 

Step  2.  Interchange  the  top  row  with  another  row,  if  necessary,  to  bring  a nonzero  entry  to  the  top  of  the  column  found  in 
Step  1. 


2 

4 

-10 

6 

12 

28' 

0 

0 

-2 

0 

7 

12 

«—  The  first  and  second  rows  in  the  preceding  matrix  were  interchanged. 

2 

4 

-5 

6 

-5 

-1 

Step  3.  If  the  entry  that  is  now  at  the  top  of  the  column  found  in  Step  1 is  a,  multiply  the  first  row  by  1 la  in  order  to 
introduce  a leading  1 . 


1 2 -5  3 6 14 

0 0 -2  0 7 12 

2 4 -5  6 -5  -1 


The  first  row  of  the  preceding  matrix  was  multiplied  by  — . 


Step  4.  Add  suitable  multiples  of  the  top  row  to  the  rows  below  so  that  all  entries  below  the  leading  1 become  zeros. 


1 2 -5  3 6 14 

0 0 -2  0 7 12 

0 0 5 0 -17  -29 


< 2 times  the  first  row  of  the  preceding  matrix  was  added  to  the  third  row. 


Step  5.  Now  cover  the  top  row  in  the  matrix  and  begin  again  with  Step  1 applied  to  the  submatrix  that  remains.  Continue 
in  this  way  until  the  entire  matrix  is  in  row  echelon  form. 


1 

0 

0 


I 

0 

0 


2 

0 

0 


2 

0 

0 


-5  3 6 14 

-2  0 7 12 

5 0 -17  -29 


t_ 


I.  eft  most  non/cro  column 
in  the  suhmatrix 


-5  3 6 14 

I 0 — 4 —6 

5 0-17  -29 


I 2 -5  3 6 

00  1 0 

.0  0 0 0 4 

1 2 -5  3 6 

00  I o-4 

.0  0 0 0 ^ 

f 

1 2 -5  3 6 

00  I o-4 

0 0 0 0 I 


14 

-6 

1_ 

leading  1 

14“ 

—6 

lj 

l eftmost  non/cro  column 
in  the  new  submatrix 

14“ 

-6  Hie  first 

2 


The  entire  matrix  is  now  in  row  echelon  form.  To  find  the  reduced  row  echelon  form  we  need  the  following  additional 
step. 

Step  6.  Beginning  with  the  last  nonzero  row  and  working  upward,  add  suitable  multiples  of  each  row  to  the  rows  above 
to  introduce  zeros  above  the  leading  l’s. 


1 2 -5  3 6 14 
0 0 1 0 0 1 
0 0 0 0 1 2 

1 2 -5  3 0 2' 
0 0 1 0 0 1 
0 0 0 0 1 2 

1 2 0 3 0 7' 

0 0 1 0 0 1 
0 0 0 0 1 2 


1 faeS  the  ted  row  of  the  precedmg  mate  war  added  to  the  second  row 


— 6 times  the  third  row  was  added  to  the  first  row. 


5 times  the  second  row  was  added  to  the  first  row. 


The  last  matrix  is  in  reduced  row  echelon  form. 


The  procedure  (or  algorithm)  we  have  just  described  for  reducing  a matrix  to  reduced  row  echelon  form  is  called  Gauss- 
Jordan  elimination.  This  algorithm  consists  of  two  parts,  a forward  phase  in  which  zeros  are  introduced  below  the 
leading  Ts  and  then  a backward  phase  in  which  zeros  are  introduced  above  the  leading  l’s.  If  only  the  forward  phase  is 
used,  then  the  procedure  produces  a row  echelon  form  only  and  is  called  Gaussian  elimination.  For  example,  in  the 
preceding  computations  a row  echelon  form  was  obtained  at  the  end  of  Step  5. 


Carl  Friedrich  Gauss  (1777-1855) 


Although  versions  of  Gaussian  elimination  were  known  much  earlier,  the  power  of  the  method 
was  not  recognized  until  the  great  German  mathematician  Carl  Friedrich  Gauss  used  it  to  compute  the  orbit  of 
the  asteroid  Ceres  from  limited  data.  What  happened  was  this:  On  January  1,  1801  the  Sicilian  astronomer 
Giuseppe  Piazzi  (1746-1826)  noticed  a dim  celestial  object  that  he  believed  might  be  a “missing  planet.”  He 
named  the  object  Ceres  and  made  a limited  number  of  positional  observations  but  then  lost  the  object  as  it  neared 
the  Sun.  Gauss  undertook  the  problem  of  computing  the  orbit  from  the  limited  data  using  least  squares  and  the 
procedure  that  we  now  call  Gaussian  elimination.  The  work  of  Gauss  caused  a sensation  when  Ceres  reappeared 


a year  later  in  the  constellation  Virgo  at  almost  the  precise  position  that  Gauss  predicted!  The  method  was  further 
popularized  by  the  German  engineer  Wilhelm  Jordan  in  his  handbook  on  geodesy  (the  science  of  measuring 
Earth  shapes)  entitled  Handbuch  der  Vermes sungs kunde  and  published  in  1888. 

[Images:  Granger  Collection  (Gauss);  wikipedia  (Jordan)] 


EXAMPLE  5 Gauss-Jordan  Elimination 


Solve  by  Gauss-Jordan  elimination. 

*1+3x2  — 2*3  +2*5  = 0 

2*i  + 6*2  — 5*3—  2*4 + 4*5—  3*6  = — 1 
5*3  + 10*4  +15*6=  5 

2*i + 6*2  + 8*4 + 4*5  + 18*6  = 6 


The  augmented  matrix  for  the  system  is 

"1  3 -2  0 2 0 0 

2 6 -5  -2  4 -3  -1 

0 0 5 10  0 15  5 

2 6 0 8 4 18  6 

Adding  — 2 times  the  first  row  to  the  second  and  fourth  rows  gives 

"1  3 -2  0 2 0 0 

0 0 —1  —2  0 —3  -1 

0 0 5 10  0 15  5 

0 0 4 8 0 18  6 


Multiplying  the  second  row  by  -1  and  then  adding  -5  times  the  new  second  row  to  the  third  row  and  -4 
times  the  new  second  row  to  the  fourth  row  gives 


13-20200 
0 0 1 2 0 3 1 
0 0 0 0 0 0 0 
0 0 0 0 0 6 2 


Interchanging  the  third  and  fourth  rows  and  then  multiplying  the  third  row  of  the  resulting  matrix  by  — 

6 


gives  the  row  echelon  form 


1 

3 

-2 

0 

2 

0 

0 

0 

0 

1 

2 

0 

3 

1 

0 

0 

0 

0 

0 

1 

1 

3 

0 

0 

0 

0 

0 

0 

0 

This  completes  the  forward  phase  since  there  are  zeros  below  the  leading  l's  . 


Adding  -3  times  the  third  row  to  the  second  row  and  then  adding  2 times  the  second  row  of  the  resulting 
matrix  to  the  first  row  yields  the  reduced  row  echelon  form 


1 3 0 4 2 0 0 

0 0 1 2 0 0 0 

0 0 0 0 0 1 ^ 

0 0 0 0 0 0 0 


This  completes  the  backward  phase  since  there  are  zeros  above  the  leading  l's  . 


The  corresponding  system  of  equations  is 


*1  + 3*2  +4*4  +2*5 

*3  + 2*4 


(3) 


= 0 
= 0 


Note  that  in  constructing  the  linear  system  in 
3 we  ignored  the  row  of  zeros  in  the 
corresponding  augmented  matrix.  Why  is  this 
justified? 


Solving  for  the  leading  variables  we  obtain 

*1=  — 3*2  — 4*4  — 2*5 

*3  = — 2*4 


Finally,  we  express  the  general  solution  of  the  system  parametrically  by  assigning  the  free  variables  *2,  *4, 
and  *5  arbitrary  values  r,  s,  and  t , respectively.  This  yields 


*1  = —3r  — 4s  — 2t, 


*2  = r,  *3  = — 2 s,  *4  = s,  *5  = t. 


Homogeneous  Linear  Systems 

A system  of  linear  equations  is  said  to  be  homogeneous  if  the  constant  terms  are  all  zero;  that  is,  the  system  has  the  form 

<*11*1  +<*12*2  + — + a \n*n  =0 
«21*1  +<*22*2  + =0 

amixi+am2X2  + ...  + am„xn  = 0 

Every  homogeneous  system  of  linear  equations  is  consistent  because  all  such  systems  have  *1  = 0,*2  = 0,...,*„  = 0as 
a solution.  This  solution  is  called  the  trivial  solutions  if  there  are  other  solutions,  they  are  called  nontrivial  solutions. 

Because  a homogeneous  linear  system  always  has  the  trivial  solution,  there  are  only  two  possibilities  for  its  solutions: 
The  system  has  only  the  trivial  solution. 

The  system  has  infinitely  many  solutions  in  addition  to  the  trivial  solution. 

In  the  special  case  of  a homogeneous  linear  system  of  two  equations  in  two  unknowns,  say 

a\x  +Z?i7  = 0 (a\,  b \ not  both  zero) 
a^x  + b^y  = 0 («2?  ^2  not  both  zero) 

the  graphs  of  the  equations  are  lines  through  the  origin,  and  the  trivial  solution  corresponds  to  the  point  of  intersection  at 
the  origin  (Figure  1.2.1). 


iy 

tfj.x  + bxy  = 0 

x 

► 

azx  + ^2  y ~ ® 


Only  the  trivial  solution 

Infinitely  many 

solutions 

-X 


axx  + = 0 

and 

a^x  + b-,y  = 0 


Figure  1.2.1 

There  is  one  case  in  which  a homogeneous  system  is  assured  of  having  nontrivial  solutions — namely,  whenever  the 
system  involves  more  unknowns  than  equations.  To  see  why,  consider  the  following  example  of  four  equations  in  six 
unknowns. 

EXAMPLE  6 A Homogeneous  System 

Use  Gauss-Jordan  elimination  to  solve  the  homogeneous  linear  system 

*1+3x2—  2*3  + 2*5  =0 

2*i  + 6*2  — 5*3  — 2*a  + 4*j  — 3*6  = 0 

(4) 

5*3+10*4  +15*6  = 0 ’ 

2*i+  6*2  + 8*4  + 4*5  + 18*6  = 0 


Observe  first  that  the  coefficients  of  the  unknowns  in  this  system  are  the  same  as  those  in 
Example  5;  that  is,  the  two  systems  differ  only  in  the  constants  on  the  right  side.  The  augmented  matrix  for 
the  given  homogeneous  system  is 


-2 

-5 

5 

0 


0 

-2 

10 

8 


0 

-3 

15 

18 


(5) 


(6) 


which  is  the  same  as  the  augmented  matrix  for  the  system  in  Example  5,  except  for  zeros  in  the  last 
column.  Thus,  the  reduced  row  echelon  form  of  this  matrix  will  be  the  same  as  that  of  the  augmented 
matrix  in  Example  5,  except  for  the  last  column.  However,  a moment’s  reflection  will  make  it  evident  that 
a column  of  zeros  is  not  changed  by  an  elementary  row  operation,  so  the  reduced  row  echelon  form  of  5 is 

1 3 0 4 2 0 0 

0 0 1 2 0 0 0 

0 0 0 0 0 1 0 

0 0 0 0 0 0 0 

The  corresponding  system  of  equations  is 

xi + 3x2  + 4x4  + 2x5  = 0 

X3  + 2x4  = 0 

*6  = 0 

Solving  for  the  leading  variables  we  obtain 

xi  = —3x2  — 4x4  — 2x5 

*3  = — 2*4  (7) 

*6  = 0 

If  we  now  assign  the  free  variables  X2,  X4,  and  X5  arbitrary  values  r,  s,  and  t,  respectively,  then  we  can 


express  the  solution  set  parametrically  as 

x\  = —3r  — 4s  — 2t,  X2  = r,  X3  = — 2 s,  X4  = s,  x$  = t,  x$  = 0 
Note  that  the  trivial  solution  results  when  r = s = t = 0- 


Free  Variable  in  Homogeneous  Linear  Systems 

Example  6 illustrates  two  important  points  about  solving  homogeneous  linear  systems: 

Elementary  row  operations  do  not  alter  columns  of  zeros  in  a matrix,  so  the  reduced  row  echelon  form  of  the 
augmented  matrix  for  a homogeneous  linear  system  has  a final  column  of  zeros.  This  implies  that  the  linear  system 
corresponding  to  the  reduced  row  echelon  form  is  homogeneous,  just  like  the  original  system. 

When  we  constructed  the  homogeneous  linear  system  corresponding  to  augmented  matrix  6,  we  ignored  the  row  of 
zeros  because  the  corresponding  equation 

Oxi  + 0*2  + 0*3  + 0x4  + Ox  5 + 0x$  = 0 

does  not  impose  any  conditions  on  the  unknowns.  Thus,  depending  on  whether  or  not  the  reduced  row  echelon  form 
of  the  augmented  matrix  for  a homogeneous  linear  system  has  any  rows  of  zero,  the  linear  system  corresponding  to 
that  reduced  row  echelon  form  will  either  have  the  same  number  of  equations  as  the  original  system  or  it  will  have 
fewer. 

Now  consider  a general  homogeneous  linear  system  with  n unknowns,  and  suppose  that  the  reduced  row  echelon  form  of 
the  augmented  matrix  has  r nonzero  rows.  Since  each  nonzero  row  has  a leading  1,  and  since  each  leading  1 corresponds 
to  a leading  variable,  the  homogeneous  system  corresponding  to  the  reduced  row  echelon  form  of  the  augmented  matrix 
must  have  r leading  variables  and  n—r  free  variables.  Thus,  this  system  is  of  the  form 

**!  + 
xk2  + 

*kr  + 

where  in  each  equation  the  expression  £}()  denotes  a sum  that  involves  the  free  variables,  if  any  [see  7,  for  example].  In 
summary,  we  have  the  following  result. 


£()  = o 
£()  = o 

£0  = o 


Free  Variable  Theorem  for  Homogeneous  Systems 

If  a homogeneous  linear  system  has  n unknowns,  and  if  the  reduced  row  echelon  form  of  its  augmented  matrix 
has  r nonzero  rows,  then  the  system  has  n - r free  variables. 


Note  that  Theorem  1.2.2  applies  only  to 
homogeneous  systems — a nonhomogeneous  system 
with  more  unknowns  than  equations  need  not  be 
consistent.  However,  we  will  prove  later  that  if  a 
nonhomogeneous  system  with  more  unknowns  then 
equations  is  consistent,  then  it  has  in  infinitely  many 
solutions. 


Theorem  1.2.1  has  an  important  implication  for  homogeneous  linear  systems  with  more  unknowns  than  equations. 
Specifically,  if  a homogeneous  linear  system  has  m equations  in  n unknowns,  and  if  m < n,  then  it  must  also  be  true  that 
r<n  (why?).  This  being  the  case,  the  theorem  implies  that  there  is  at  least  one  free  variable,  and  this  implies  in  turn  that 
the  system  has  infinitely  many  solutions.  Thus,  we  have  the  following  result. 


THEOREM  1.2.2 

A homogeneous  linear  system  with  more  unknowns  than  equations  has  infinitely  many  solutions. 


In  retrospect,  we  could  have  anticipated  that  the  homogeneous  system  in  Example  6 would  have  infinitely  many 
solutions  since  it  has  four  equations  in  six  unknowns. 


Gaussian  Elimination  and  Back-Substitution 


For  small  linear  systems  that  are  solved  by  hand  (such  as  most  of  those  in  this  text),  Gauss- Jordan  elimination  (reduction 
to  reduced  row  echelon  form)  is  a good  procedure  to  use.  However,  for  large  linear  systems  that  require  a computer 
solution,  it  is  generally  more  efficient  to  use  Gaussian  elimination  (reduction  to  row  echelon  form)  followed  by  a 
technique  known  as  back-substitution  to  complete  the  process  of  solving  the  system.  The  next  example  illustrates  this 
technique. 

EXAMPLE  7 Example  5 Solved  by  Back-Substitution 

From  the  computations  in  Example  5,  a row  echelon  form  of  the  augmented  matrix  is 

'1  3 -2  0 2 0 O' 

0 0 1 2 0 3 1 

0 0 0 0 0 1 | 

0 0 0 0 0 0 0 

To  solve  the  corresponding  system  of  equations 

x\  +3x2  — 2*3  +2x5 

*3  + 2x4  + 3x6 

*6 

we  proceed  as  follows: 

Step  1.  Solve  the  equations  for  the  leading  variables. 

xi  = — 3x2  + 2x3  “ 2x5 
X3  = 1 — 2x4  — 3x6 

*6  = -J 

Step  2.  Beginning  with  the  bottom  equation  and  working  upward,  successively  substitute  each  equation 

into  all  the  equations  above  it. 


Substituting  X6  = into  second  equation  yields 


xj  = — 3x2  + 2x2  “ 2x5 
7:3  = — 2x4 


Substituting  7:3  = — 2*4  into  the  first  equation  yields 

x 1 = — 3x2  — 4*4  — 2*5 
7:3  = — 2*4 


Step  3.  Assign  arbitrary  values  to  the  free  variables,  if  any. 

If  we  now  assign  X2,  X4,  and  X5  the  arbitrary  values  r,  s,  and  t,  respectively,  the  general  solution  is  given  by 
the  formulas 

x\=  -3r  — 4s  — 2t,  X2  = r,  7:3  = — 2s,  X4  = s,  x$  = t,  x§  = y 
This  agrees  with  the  solution  obtained  in  Example  5. 


EXAMPLE  8 

Suppose  that  the  matrices  below  are  augmented  matrices  for  linear  systems  in  the  unknowns  x\,  X2,  X3,  and 
X4.  These  matrices  are  all  in  row  echelon  form  but  not  reduced  row  echelon  form.  Discuss  the  existence 
and  uniqueness  of  solutions  to  the  corresponding  linear  systems 


'1 

-3 

7 

2 

5" 

'l 

-3 

7 

2 

5" 

'l 

-3 

7 

2 

5' 

0 

1 

2 

-4 

1 

(b) 

0 

1 

2 

-4 

1 

(c) 

0 

1 

2 

-4 

1 

0 

0 

1 

6 

9 

0 

0 

1 

6 

9 

0 

0 

1 

6 

9 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

Solution 

The  last  row  corresponds  to  the  equation 

Ox  1 4-  0x2  + 0x3  + 0*4  = 1 
from  which  it  is  evident  that  the  system  is  inconsistent. 

The  last  row  corresponds  to  the  equation 

Oxi  + 0x2  + 0x3  + 0x4  = 0 

which  has  no  effect  on  the  solution  set.  In  the  remaining  three  equations  the  variables  xi,  X2,  and  X3 
correspond  to  leading  l's  and  hence  are  leading  variables.  The  variable  X4  is  a free  variable.  With  a 
little  algebra,  the  leading  variables  can  be  expressed  in  terms  of  the  free  variable,  and  the  free  variable 
can  be  assigned  an  arbitrary  value.  Thus,  the  system  must  have  infinitely  many  solutions. 

The  last  row  corresponds  to  the  equation 

X4  = 0 

which  gives  us  a numerical  value  for  X4.  If  we  substitute  this  value  into  the  third  equation,  namely, 

X3  + 6x4  = 9 

we  obtain  X3  = 9.  You  should  now  be  able  to  see  that  if  we  continue  this  process  and  substitute  the 
known  values  of  X3  and  X4  into  the  equation  corresponding  to  the  second  row,  we  will  obtain  a unique 
numerical  value  for  X2;  and  if,  finally,  we  substitute  the  known  values  of  X4,  X3,  and  X2  into  the 


equation  corresponding  to  the  first  row,  we  will  produce  a unique  numerical  value  for  x\.  Thus,  the 
system  has  a unique  solution. 


Some  Facts  About  Echelon  Forms 


There  are  three  facts  about  row  echelon  forms  and  reduced  row  echelon  forms  that  are  important  to  know  but  we  will  not 
prove: 

Every  matrix  has  a unique  reduced  row  echelon  form;  that  is,  regardless  of  whether  you  use  Gauss-Jordan  elimination 

* 

or  some  other  sequence  of  elementary  row  operations,  the  same  reduced  row  echelon  form  will  result  in  the  end. 

Row  echelon  forms  are  not  unique;  that  is,  different  sequences  of  elementary  row  operations  can  result  in  different 
row  echelon  forms. 

Although  row  echelon  forms  are  not  unique,  all  row  echelon  forms  of  a matrix  A have  the  same  number  of  zero  rows, 
and  the  leading  l's  always  occur  in  the  same  positions  in  the  row  echelon  forms  of  A.  Those  are  callled  the  pivot 
positions  of  A.  A column  that  contains  a pivot  position  is  called  a pivot  column  of  A. 

EXAMPLE  9 Pivot  Positions  and  Columns 


Earlier  in  this  section  (immediately  after  Definition  1)  we  found  a row  echelon  form  of 


to  be 


0-2  0 
4 -10  6 
4-5  6 


7 12 

12  28 
-5  -1 


1 2 -5  3 6 14 

00  1 0 -6 

0 0 0 0 1 2 


The  leading  l’s  occur  in  positions  (row  1,  column  1),  (row  2,  column  3),  and  (row  3,  column  5).  These  are 
the  pivot  positions.  The  pivot  columns  are  columns  1,3,  and  5. 


Roundoff  Error  and  Instability 

There  is  often  a gap  between  mathematical  theory  and  its  practical  implementation — Gauss-Jordan  elimination  and 
Gaussian  elimination  being  good  examples.  The  problem  is  that  computers  generally  approximate  numbers,  thereby 
introducing  roundoff  errors,  so  unless  precautions  are  taken,  successive  calculations  may  degrade  an  answer  to  a degree 
that  makes  it  useless.  Algorithms  (procedures)  in  which  this  happens  are  called  unstable.  There  are  various  techniques 
for  minimizing  roundoff  error  and  instability.  For  example,  it  can  be  shown  that  for  large  linear  systems  Gauss-Jordan 
elimination  involves  roughly  50%  more  operations  than  Gaussian  elimination,  so  most  computer  algorithms  are  based  on 
the  latter  method.  Some  of  these  matters  will  be  considered  in  Chapter  9. 


Concept  Review 

Reduced  row  echelon  form 
Row  echelon  form 
• Leading  1 
Leading  variables 
Free  variables 

General  solution  to  a linear  system 
Gaussian  elimination 
Gauss-Jordan  elimination 
Forward  phase 
Backward  phase 
Homogeneous  linear  system 
Trivial  solution 
Nontrivial  solution 

Dimension  Theorem  for  Homogeneous  Systems 
B ack-  sub  stitution 

Skills 

Recognize  whether  a given  matrix  is  in  row  echelon  form,  reduced  row  echelon  form,  or  neither. 

Construct  solutions  to  linear  systems  whose  corresponding  augmented  matrices  that  are  in  row  echelon  form  or 
reduced  row  echelon  form. 

Use  Gaussian  elimination  to  find  the  general  solution  of  a linear  system. 

Use  Gauss-Jordan  elimination  in  order  to  find  the  general  solution  of  a linear  system. 

Analyze  homogeneous  linear  systems  using  the  Free  Variable  Theorem  for  Homogeneous  Systems. 


Exercise  Set  1 .2 


1.  In  each  part,  determine  whether  the  matrix  is  in  row  echelon  form,  reduced  row  echelon  form,  both,  or  neither. 


(a) 


(b) 


(c) 


(d) 


0 O' 


0 

0 


0 0 0 


0 0 0 


3 r 

2 4 


(e)  [l  2 0 3 O' 

0 0 110 

0 0 0 0 1 

0 0 0 0 0 

(f)  fo  o' 

0 0 

0 0 

(g)  fl  -7  5 5 

[o  13  2 

Answer: 

(a)  Both 

(b)  Both 

(c)  Both 

(d)  Both 

(e)  Both 

(f)  Both 

(g)  Row  echelon 

2.  In  each  part,  determine  whether  the  matrix  is  in  row  echelon  form,  reduced  row  echelon  form,  both,  or  neither. 

(a)  120 
0 1 0 
0 0 0 

(b)  f 1 0 O' 

0 1 0 

0 2 0 

(c)  f 1 3 4~ 

0 0 1 

0 0 0 

(d)  fl  5 —3 

0 1 1 

0 0 0 

(e)  f 1 2 3] 

0 0 0 

0 0 1 

(f)  fl  2 3 4 5' 

10  7 13 

0 0 0 0 1 

0 0 0 0 0 

(g)  f 1 -2  0 1 

_0  0 1-2 

3.  In  each  part,  suppose  that  the  augmented  matrix  for  a system  of  linear  equations  has  been  reduced  by  row  operations 
to  the  given  reduced  row  echelon  form.  Solve  the  system. 


(a)  [1  -3  4 7' 

0 12  2 

0 0 15 

(b)  f 1 0 8 -5  6" 

014-93 
0 0 1 12 

(c)  fl  7 -2  0 -8  -3' 

0 0 11  6 5 

0 0 0 1 3 9 

0 0 0 0 0 0 

(d)  fl  —3  7 1” 

0 14  0 

0 0 0 1 

Answer: 

(a)  *1  = - 37,  *2  = - 8,  x3  = 5 

(b)  *1  = 13i  — 10,  x2  = 13t  — 5,  *3  = — t + 2,  *4  = t 

(c)  xl  = — 7s  + 2t  — 11,  x2  = s,  *3  = —3t—4,  *4  = — 3i  + 9,  x$  = t 

(d)  Inconsistent 

4.  In  each  part,  suppose  that  the  augmented  matrix  for  a system  of  linear  equations  has  been  reduced  by  row  operations 
to  the  given  reduced  row  echelon  form.  Solve  the  system. 

(a) fl  0 0 -3' 

0 10  0 

0 0 1 7 

(b)  f 1 0 0 -7  8' 

0 10  3 2 

0 0 1 1-5 

(c)  fl  -6  0 0 3 -2" 

0 0 1 0 4 7 

0 0 0 1 5 8 

0 0 0 0 0 0 

(d)  1 -3  0 0" 

0 0 10 

0 0 0 1 

In  Exercises  5-8,  solve  the  linear  system  by  Gauss-Jordan  elimination. 

5.  xi +*2  + 2*3  = 8 

—*1—2*2 + 3*3  = 1 

3*i— 7*2 + 4*3  = 10 


Answer: 

*1  = 3,  *2  = 1,  *3  = 2 


6.  2*i  + 2*2  + 2*3  = 0 

—2*1  + 5*2  + 27:3  = 1 

8*i +*2  + 4*3  = -1 

7.  * — 7 + 2z—  w = — 1 

2*  + 7 — 2z  — 2w  = — 2 
- * + 2y  - 4z  + w = 1 

3*  - 3w  = - 3 

Answer: 

* = * — 1,  7 = 2s,  z = s,  w =t 

8.  —2b  + 3 c = 1 

3a  + 6b  — 3c  = —2 

6(2  + 6£  + 3c  = 5 

In  Exercises  9-12,  solve  the  linear  system  by  Gaussian  elimination. 

9.  Exercise  5 

Answer: 

*1  = 3,  *2=1,  *3  = 2 

10.  Exercise  6 

11.  Exercise  7 

Answer: 


* = t — 1,  7 = 2s,  z = s,  w = t 

12.  Exercise  8 

In  Exercises  13-16,  determine  whether  the  homogeneous  system  has  nontrivial  solutions  by  inspection  (without  pencil 
and  paper). 

13.  2*i  - 3*2 + 4*3  - *4  = 0 

7*i+  *2  — 8*3  + 9*4  = 0 
2*i + 8*2+  *3  — *4  = 0 

Answer: 

Has  nontrivial  solutions 

14.  *1+3*2“  *3  = 0 

*2  — 8*3  = 0 

4*3  = 0 

15.011*1  +a  12*2 + 013*3  = 0 
021*1  + 022*2  4"  023*3  = 0 

Answer: 

Has  nontrivial  solutions 
16.  3*i  -2*2  = 0 
6*i  —4*2  = 0 


In  Exercises  17-24,  solve  the  given  homogeneous  linear  system  by  any  method. 


17.  2*i  + ^2  + ^3  = 0 

*1  4-  2^2  = 0 

*2+  *3  = 0 

Answer: 

*1  =0,  *2  = 0,  *3  = 0 

18.  2*  - y - 3z  = 0 
— * 4-  2y  — 3z  = 0 

* 4=  y 4-  4z  = 0 


19.  3*i  4- *2  4- *3  4“  *4=  0 
5*i  “*2  4=  *3  — *4  = 0 

Answer: 

*1=  — s,  *2=  — t — s,  *3  = 4s,  *4  = ^ 

20.  v + 3m?  — 2*  = 0 
2a  4=  v - 4m?  4-  3*  = 0 
2a  4=  3v  4-  2m?  - x = 0 

-4a  - 3v  + 5m?  - 4*  = 0 

21.  2*4~2y4-4z  = 0 
m?  — y — 3z  = 0 

2m?  4=  3*  4=  7 4-  z = 0 
—2w  4=  x + 3j>/  — 2z  = 0 

Answer: 


w = £,  * = — t,  y = t7  z = 0 

22.  *1  4-  3*2  4- *4  = 0 

*14-4*2  4^2*3  = 0 

—2*2  — 2*3  — *4  = 0 
2*i  —4*2  4-  *3  4-  *4  = 0 
*1  — 2*2  — *3  4- *4  = 0 

23.  2/i  — I2  + 3/3  + 4/4  = 9 

/i  -2/34-7/4  = 11 

31 1 - 3/2  + /3  + 5/4  = 8 

21  \ 4-  /2  4-  4/3  4-  4/4  = 10 


Answer: 

/l=  -1,  /2  = 0,  /3  = 1,  /4  = 2 

24.  Z3  4-  Z4  4-  Z5  = 0 

— Zi  — Z2  4-  2Z3  — 3Z4  4-  Z5  = 0 
Zi  4-  Z2  — 2Z3  — Z5  = 0 

2Zi  4-  2Z2  — Z3  +Z^  = 0 


In  Exercises  25-28,  determine  the  values  of  a for  which  the  system  has  no  solutions,  exactly  one  solution,  or  infinitely 


many  solutions. 


25.  * + 2 y-  3 z = 4 

3x  — y + 5 z = 2 

47:+  7+^?2  — 14^z  = a + 2 
Answer: 

If  a = 4,  there  are  infinitely  many  solutions;  if  a = — 4,  there  are  no  solutions;  if  ^ ^ +4,  there  is  exactly  one 
solution. 

26.  7:4-27+  z = 2 

3z  = 1 


2t:  — 2y  + 
t:  + 2y  — (a?  — 3jz 


= a 


27.  * + = 1 
2t:+^2  — 5J7  = a — 1 


Answer: 

If  a = 3,  there  are  infinitely  many  solutions;  if  a = — 3>  there  are  no  solutions;  if  a#  +3?  there  is  exactly  one 
solution. 

28.  t:  + 7 + Iz  = -7 

2t:  + 37  + 17z  = —16 

7:  + 27  + + 1 Jz  = 3a 

In  Exercises  29-30,  solve  the  following  systems,  where  a , b,  and  c are  constants. 

29.  2x  + y = a 

3x  + 67  = b 

Answer: 

2a  b a , 2b 

x = ~3  y=  ~3+T 

30.  ^1  +*2+  *3  = « 

2t:  1 +27:3  = b 

3x2  + 37:3  = c 

31.  Find  two  different  row  echelon  forms  of 

1 3 

2 7 

This  exercise  shows  that  a matrix  can  have  multiple  row  echelon  forms. 

Answer: 

n n 1 

are  possible  answers. 


"1  3' 

and 

O' 

.0 

|_( 

) 1_ 

32.  Reduce 


2 1 3 

0 -2  -29 

3 4 5 

to  reduced  row  echelon  form  without  introducing  fractions  at  any  intermediate  stage. 

33.  Show  that  the  following  nonlinear  system  has  1 8 solutions  if  0 < a < 2tt,  0 < 7 < 2?r,  and  0 < 7 < 2 tt. 

sin  a + 2 cos  ft  + 3 tan  7 = 0 

2 sina  + 5 cos  + 3 tan  7 = 0 

— sin  a — 5 cos  5 tan  7 = 0 


[Hint:  Begin  by  making  the  substitutions  * = sin  Cb  y = cos  ft,  and  z = tan  *) .] 

34.  Solve  the  following  system  of  nonlinear  equations  for  the  unknown  angles  a,  (3,  and  y,  where  0 < a < 2tt, 
0 < ft  < 2tt,  and  0 < 7 < tt. 

2 sin  a — cos  0 4=  3 tan  7 = 3 
4 sin  + 2 cos  /?  — 2 tan  7 = 2 
6 sinct  — 3 cos  i3+  tan  7 = 9 

35.  Solve  the  following  system  of  nonlinear  equations  for  x,  y,  and  z. 

x2+y2+z 2 = 6 

x2-y2  + 2z 2 = 2 

2 x2+y2-z2  = 3 

[Hint:  Begin  by  making  the  substitutions  X = x2-  Y=yA,  Z = z2-] 


Answer: 

x = ± 1,  7 = ± ^3,  z = ± ^2 

36.  Solve  the  following  system  for  x,  y,  and  z. 


I+2_4  = , 

x y z 

— + — + — = 0 

x y z 

-I  + - + — = 5 
x y z 


37.  Find  the  coefficients  a,  b , c,  and  so  that  the  curve  shown  in  the  accompanying  figure  is  the  graph  of  the  equation 


y = ax?  4-  ix2  + cx  + d . 


r 

20l- 
(0,  10)  I 


(U7) 


1 1 

1 1 1 

1 1 1 J 

-2 

-20 

(3,-H) 

6 

(4.-14) 

Figure  Ex-37 


Answer: 


a = \,  b=  -6,  c-2,  d = \0 

38.  Find  the  coefficients  a,  b,  c,  and  d so  that  the  curve  shown  in  the  accompanying  figure  is  given  by  the  equation 
ax2  + ay2  ■¥  bx  •¥  cy  + d = 0. 


A? 

(-2.7) 

M-5) 


.t 


(4,  -3) 


Figure  Ex-38 

39.  If  the  linear  system 

a\x  +&17  -¥c\z  = 0 

a2X-b^y^C2Z  = 0 

a^x  + b^y  — C3Z  = 0 

has  only  the  trivial  solution,  what  can  be  said  about  the  solutions  of  the  following  system? 

a\x^b\y  -^c\z  = 3 

$2X  -b2y  -hC2Z  = 7 

a^x  + b^y  — cjz  = 11 


Answer: 


The  nonhomogeneous  system  will  have  exactly  one  solution. 

40*  (a)  If  A is  a 3 x 5 matrix,  then  what  is  the  maximum  possible  number  of  leading  l’s  in  its  reduced  row  echelon  form? 

(b)  If  B is  a 3 x 6 matrix  whose  last  column  has  all  zeros,  then  what  is  the  maximum  possible  number  of  parameters 
in  the  general  solution  of  the  linear  system  with  augmented  matrix  B? 

(c)  If  C is  a 5 x 3 matrix,  then  what  is  the  minimum  possible  number  of  rows  of  zeros  in  any  row  echelon  form  of 
C? 


41*  (a)  Prove  that  if  ad  — be  * Cf  then  the  reduced  row  echelon  form  of 


a b 
c d 


is 


1 0 
0 1 


(b)  Use  the  result  in  part  (a)  to  prove  that  if  ad  — be  *■  0?  then  the  linear  system 

ax  + by  = k 
cx  + dy  = l 

has  exactly  one  solution. 


42.  Consider  the  system  of  equations 

ax  + by  = 0 

cx  + dy  = 0 

ex  + fy  = 0 

Discuss  the  relative  positions  of  the  lines  ax  + by  = 0>  cx  + dy  = 0?  and  ex  4=  fy  = 0 when  (a)  the  system  has 

only  the  trivial  solution,  and  (b)  the  system  has  nontrivial  solutions. 


43.  Describe  all  possible  reduced  row  echelon  forms  of 

(a)  a b c 
d e / 
g h i 

(b)  abed 
e f g h 

i j k l 
m n p q 

True-False  Exercises 

In  parts  (a)-(i)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  If  a matrix  is  in  reduced  row  echelon  form,  then  it  is  also  in  row  echelon  form. 

Answer: 

True 

(b)  If  an  elementary  row  operation  is  applied  to  a matrix  that  is  in  row  echelon  form,  the  resulting  matrix  will  still  be  in 
row  echelon  form. 

Answer: 

False 

(c)  Every  matrix  has  a unique  row  echelon  form. 

Answer: 

False 

(d)  A homogeneous  linear  system  in  n unknowns  whose  corresponding  augmented  matrix  has  a reduced  row  echelon 
form  with  r leading  l's  has  n — r free  variables. 

Answer: 

True 

(e)  All  leading  l's  in  a matrix  in  row  echelon  form  must  occur  in  different  columns. 

Answer: 

True 

(f)  If  every  column  of  a matrix  in  row  echelon  form  has  a leading  1 then  all  entries  that  are  not  leading  l's  are  zero. 
Answer: 

False 

(g)  If  a homogeneous  linear  system  of  n equations  in  n unknowns  has  a corresponding  augmented  matrix  with  a reduced 
row  echelon  form  containing  n leading  1 's,  then  the  linear  system  has  only  the  trivial  solution. 

Answer: 


True 


(h)  If  the  reduced  row  echelon  form  of  the  augmented  matrix  for  a linear  system  has  a row  of  zeros,  then  the  system  must 
have  infinitely  many  solutions. 

Answer: 

False 

(i)  If  a linear  system  has  more  unknowns  than  equations,  then  it  must  have  infinitely  many  solutions. 

Answer: 

False 
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1.3  Matrices  and  Matrix  Operations 

Rectangular  arrays  of  real  numbers  arise  in  contexts  other  than  as  augmented  matrices  for  linear  systems.  In  this 
section  we  will  begin  to  study  matrices  as  objects  in  their  own  right  by  defining  operations  of  addition,  subtraction, 
and  multiplication  on  them. 


Matrix  Notation  and  Terminology 

In  Section  1.2  we  used  rectangular  arrays  of  numbers,  called  augmented  matrices , to  abbreviate  systems  of  linear 
equations.  However,  rectangular  arrays  of  numbers  occur  in  other  contexts  as  well.  For  example,  the  following 
rectangular  array  with  three  rows  and  seven  columns  might  describe  the  number  of  hours  that  a student  spent  studying 
three  subjects  during  a certain  week: 


Mon. 

Tues. 

Wed. 

Thurs. 

Fri. 

Sat. 

Sun. 

Math 

2 

3 

2 

4 

1 

4 

2 

History 

0 

3 

1 

4 

3 

2 

2 

Language 

4 

1 

3 

1 

0 

0 

2 

If  we  suppress  the  headings,  then  we  are  left  with  the  following  rectangular  array  of  numbers  with  three  rows  and 
seven  columns,  called  a “matrix”: 

"2  3 2 4 1 4 2' 

0 3 1 4 3 2 2 
4 13  10  0 2 

More  generally,  we  make  the  following  definition. 

DEFINITION  1 

A matrix  is  a rectangular  array  of  numbers.  The  numbers  in  the  array  are  called  the  entries  in  the  matrix. 


A matrix  with  only  one  column  is  called  a column 
vector  or  a column  matrix , and  a matrix  with  only 
one  row  is  called  a row  vector  or  a row  matrix.  In 
Example  1 , the  2 x 1 matrix  is  a column  vector,  the 
1x4  matrix  is  a row  vector,  and  the  1 x 1 matrix 
is  both  a row  vector  and  a column  vector. 


EXAMPLE  1 Examples  of  Matrices 


Some  examples  of  matrices  are 


[4] 


1 2" 

l 

1 

tn 

3 0 

.[210  -3], 

o ^ 1 

-1  4 

2 

0 0 0 

1 

3 


The  size  of  a matrix  is  described  in  terms  of  the  number  of  rows  (horizontal  lines)  and  columns  (vertical  lines)  it 
contains.  For  example,  the  first  matrix  in  Example  1 has  three  rows  and  two  columns,  so  its  size  is  3 by  2 (written 
3 x 2)-  In  a size  description,  the  first  number  always  denotes  the  number  of  rows,  and  the  second  denotes  the  number 
of  columns.  The  remaining  matrices  in  Example  1 have  sizes  Ix4?3x3?2xl>  and  lxl?  respectively. 


We  will  use  capital  letters  to  denote  matrices  and  lowercase  letters  to  denote  numerical  quantities;  thus  we  might  write 


A = 


2 1 7 

3 4 2 


or  C = 


a b 
d & 


c 

/ 


When  discussing  matrices,  it  is  common  to  refer  to  numerical  quantities  as  scalars.  Unless  stated  otherwise,  scalars 
will  be  real  numbers ; complex  scalars  will  be  considered  later  in  the  text. 


Matrix  brackets  are  often  omitted  from  1x1 
matrices,  making  it  impossible  to  tell,  for  example, 
whether  the  symbol  4 denotes  the  number  “four”  or 
the  matrix  [4].  This  rarely  causes  problems  because 
it  is  usually  possible  to  tell  which  is  meant  from  the 
context. 


The  entry  that  occurs  in  row  i and  column  j of  a matrix  A will  be  denoted  by  ay.  Thus  a general  3x4  matrix  might  be 
written  as 


and  a general  m x n matrix  as 


ll 

<312 

<313 

<3  14 

A = 

<221 

<*22 

<323 

<324 

<3  31 

<*32 

<3  33 

<334 

A = 


<311 

<312  - 

aln 

<*21 

<322  ‘ 

a2n 

(i) 

am  1 

am2 

amn 

When  a compact  notation  is  desired,  the  preceding  matrix  can  be  written  as 

[“vUxH  or  K/] 


the  first  notation  being  used  when  it  is  important  in  the  discussion  to  know  the  size,  and  the  second  being  used  when 
the  size  need  not  be  emphasized.  Usually,  we  will  match  the  letter  denoting  a matrix  with  the  letter  denoting  its 
entries;  thus,  for  a matrix  B we  would  generally  use  by  for  the  entry  in  row  i and  column y,  and  for  a matrix  C we 
would  use  the  notation  cy. 


The  entry  in  row  i and  column  j of  a matrix  A is  also  commonly  denoted  by  the  symbol  (A)y.  Thus,  for  matrix  1 
above,  we  have 


(A)ij  —ajj 


and  for  the  matrix 


A = 


2 -3 

7 0 


we  have  (A)  u = 2,  (A)  12  = - 3,  (A)2\  = 7,  and  C<4)22  = 0. 


Row  and  column  vectors  are  of  special  importance,  and  it  is  common  practice  to  denote  them  by  boldface  lowercase 
letters  rather  than  capital  letters.  For  such  matrices,  double  subscripting  of  the  entries  is  unnecessary.  Thus  a general 
1 x«  row  vector  a and  a general  mx\  column  vector  b would  be  written  as 


a = [ai  a2 


an]  and  h = 


h 

bm 


A matrix  ,4  with  n rows  and  n columns  is  called  a square  matrix  of  order  n,  and  the  shaded  entries  a n,  aji*  ann 
in  2 are  said  to  be  on  the  main  diagonal  of  A. 


(2) 


a\\ 

a 12 

0|„ 

021 

<722 

02/1 

O/i  1 

C*n2  • • * 

0„/i 

Operations  on  Matrices 

So  far,  we  have  used  matrices  to  abbreviate  the  work  in  solving  systems  of  linear  equations.  For  other  applications, 
however,  it  is  desirable  to  develop  an  “arithmetic  of  matrices”  in  which  matrices  can  be  added,  subtracted,  and 
multiplied  in  a useful  way.  The  remainder  of  this  section  will  be  devoted  to  developing  this  arithmetic. 


DEFINITION  2 

Two  matrices  are  defined  to  be  equal  if  they  have  the  same  size  and  their  corresponding  entries  are  equal. 


J 


The  equality  of  two  matrices 

A = and  B=  [6^] 

of  the  same  size  can  be  expressed  either  by  writing 
(A)ij  = (B)ij 

or  by  writing 

aiJ  = ^ij 

where  it  is  understood  that  the  equalities  hold  for 
all  values  of  i and  j. 


EXAMPLE  2 Equality  of  Matrices 


Consider  the  matrices 


i — 

Csl 

II 

1 

ISJ 
1 

1 

ISJ 

1 

O 

A = 

, c= 

LkJ 

3 5 

3 4 

0 

If  x = 5,  then  A — but  for  all  other  values  of  v the  matrices  A and  B are  not  equal,  since  not  all  of 
their  corresponding  entries  are  equal.  There  is  no  value  of  v for  which  A=C  since  A and  C have 
different  sizes. 


r 


DEFINITION  3 

If  A and  B are  matrices  of  the  same  size,  then  the  sum  A 4 B is  the  matrix  obtained  by  adding  the  entries  of  B 
to  the  corresponding  entries  of  A , and  the  difference  A — B is  the  matrix  obtained  by  subtracting  the  entries  of 
B from  the  corresponding  entries  of  A.  Matrices  of  different  sizes  cannot  be  added  or  subtracted. 


In  matrix  notation,  if  A = [fly  ] and  B = [6y  ] have  the  same  size,  then 

(A  + B)jj  = (-d)y  + (B)  ij  = fly  + bjj  and  (A  — 5)y  = (-*4) y — C B)ij  = ai j ™ j 

EXAMPLE  3 Addition  and  Subtraction 

Consider  the  matrices 


"2  10  3' 

"-4  3 5 1' 

'1  r 
2 2_ 

A = 

-1  0 2 4 

4-270 

, B = 

2 2 0 -1 
3 2-4  5 

, c= 

Then 


'-2  4 5 4' 

'6-2-5  2' 

12  2 3 

and  A — B = 

-3-2  2 5 

7 0 3 5 

1 -4  11  -5 

The  expressions  A + C,  B + C,  A^C,  and B —C  are  undefined. 


r 


DEFINITION  4 

If  A is  any  matrix  and  c is  any  scalar,  then  the  product  cA  is  the  matrix  obtained  by  multiplying  each  entry  of 
the  matrix  A by  c.  The  matrix  cA  is  said  to  be  a scalar  multiple  of  A. 


In  matrix  notation,  if  A = [ay]>  then 


(CA)  2j  — C (-d)  y — Cfly 


EXAMPLE  4 Scalar  Multiples 


For  the  matrices 


A = 

i 

m- 

00 
CM 

1  

. B = 

0 2 

1 

, C = 

"9 

-6 

3' 

1 3 1 

-1  3 

-5 

3 

0 

12 

we  have 


l 

00 

I 

, (-1)5  = 

"0  -2  -7' 

, 1 C = 

"3  -2  f 

_2  6 2_ 

1 -3  5_ 

’ 3 

O 

4^ 

It  is  common  practice  to  denote  (-  1 )B  by  —B. 


Thus  far  we  have  defined  multiplication  of  a matrix  by  a scalar  but  not  the  multiplication  of  two  matrices.  Since 
matrices  are  added  by  adding  corresponding  entries  and  subtracted  by  subtracting  corresponding  entries,  it  would 
seem  natural  to  define  multiplication  of  matrices  by  multiplying  corresponding  entries.  However,  it  turns  out  that  such 
a definition  would  not  be  very  useful  for  most  problems.  Experience  has  led  mathematicians  to  the  following  more 
useful  definition  of  matrix  multiplication. 


DEFINITION  5 

If  A is  an  ^ x r matrix  and  B is  an  r x n matrix,  then  the  product  AB  is  the  ^ x n matrix  whose  entries  are 
determined  as  follows:  To  find  the  entry  in  row  i and  column  j of  AB , single  out  row  i from  the  matrix  A and 
column  j from  the  matrix  B.  Multiply  the  corresponding  entries  from  the  row  and  column  together,  and  then 
add  up  the  resulting  products. 


EXAMPLE  5 Multiplying  Matrices 

Consider  the  matrices 


'4  14  3' 

i 

to  — ‘ 
to 

O 4^ 

to 

II 

0-131 
2 7 5 2 

Since  A is  a 2 x 3 matrix  and  B is  a 3 x 4 matrix,  the  product  AB  is  a 2 x 4 matrix.  To  determine,  for 
example,  the  entry  in  row  2 and  column  3 of  AB,  we  single  out  row  2 from  A and  column  3 from  B. 
Then,  as  illustrated  below,  we  multiply  corresponding  entries  together  and  add  up  these  products. 


'l  2 4" 

"4  14  3' 

'□  □ 

□ □' 

.2  6 0_ 

0-131 
2 7 5 2 

□ □ 

nm  □ 

(2  -4)  + (6-3)  + (0-5)  = 26 


The  entry  in  row  1 and  column  4 of  AB  is  computed  as  follows: 


'l  2 4' 

"4  14  3' 

0-131 

’□  □ □ Ef 

_2  6 0_ 

2 7 5 2 

□ □ □ □. 

(1  -3)  + (2-  1) + (4-2)  = 13 

The  computations  for  the  remaining  entries  are 

(1.4)  + (2.0)  + (4.2)  = 12 
(1.1) -(2.1) + (4.7)  = 27 

(1.4)  + (2.3)  + (4.5)  = 30  _ Ti2  27  30  13' 

(2.4)  + (6.0) + (0.2)  = 8 _ [ 8 -4  26  12_ 

(2.1) -(6.1) + (0.7)  = -4 

(2.3) + (6.1) + (0.2)  = 12 


The  definition  of  matrix  multiplication  requires  that  the  number  of  columns  of  the  first  factor  A be  the  same  as  the 
number  of  rows  of  the  second  factor  B in  order  to  form  the  product  AB.  If  this  condition  is  not  satisfied,  the  product  is 
undefined.  A convenient  way  to  determine  whether  a product  of  two  matrices  is  defined  is  to  write  down  the  size  of 
the  first  factor  and,  to  the  right  of  it,  write  down  the  size  of  the  second  factor.  If,  as  in  3,  the  inside  numbers  are  the 
same,  then  the  product  is  defined.  The  outside  numbers  then  give  the  size  of  the  product. 

A B AB 

m x r r x n = m x n 

Inside  (3) 

Outside 


Gotthold  Eisenstein  (1823-1852) 


The  concept  of  matrix  multiplication  is  due  to  the  German  mathematician  Gotthold 
Eisenstein,  who  introduced  the  idea  around  1 844  to  simplify  the  process  of  making  substitutions  in  linear 
systems.  The  idea  was  then  expanded  on  and  formalized  by  Cayley  in  his  Memoir  on  the  Theory  of  Matrices 
that  was  published  in  1858.  Eisenstein  was  a pupil  of  Gauss,  who  ranked  him  as  the  equal  of  Isaac  Newton 
and  Archimedes.  However,  Eisenstein,  suffering  from  bad  health  his  entire  life,  died  at  age  30,  so  his  potential 
was  never  realized. 

[Image:  wikipedia \ 


EXAMPLE  6 Determining  Whether  a Product  Is  Defined 


Suppose  that  A,  B , and  C are  matrices  with  the  following  sizes: 

ABC 

3x4  4x7  7x3 

Then  by  3,  AB  is  defined  and  is  a 3 x 7 matrix;  BC  is  defined  and  is  a 4 x 3 matrix;  and  CA  is  defined 
and  is  a 7 x 4 matrix.  The  products  AC,  CB , and  BA  are  all  undefined. 


AB  = 


an^xr  matrix  and  B = [6y  ] is  an  r x n 

matrix,  then, 

as  illustrated  by  the  shading  in  4, 

'an 

«12 

• - * <*lr 

_ 

a2\ 

<*22 

• • • Ct2r 

^11 

*12  - 

• • *y  • 

• • *ln 

; 

: 

1 

*21 

*22  - 

• • *2,  • 

• • *2m 

(4) 

an 

<*i2 

• * * <*ir 

: 

: 

: 

am\ 

“m2 

• • • <*mr 

i>r  1 

*r2  ’ 

. . h ■ . 

°rj 

• • *171 

the  entry  ( AB)  y in  row  i and  column  j of  AB  is  given  by 

(AB)  jj  = + ai3^3j  + 


a<Yb 


(5) 


Partitioned  Matrices 


A matrix  can  be  subdivided  or  partitioned  into  smaller  matrices  by  inserting  horizontal  and  vertical  rules  between 
selected  rows  and  columns.  For  example,  the  following  are  three  possible  partitions  of  a general  3x4  matrix  A — the 
first  is  a partition  of  A into  four  submatrices  An,  A 12,  ^21,  and  A22I  the  second  is  a partition  of  A into  its  row  vectors 


1*1,  F2,  and  ry  and  the  third  is  a partition  of  A into  its  column  vectors  ci,  C2,  C3,  and  C4: 


"an 

<*12 

0 13 

0 14 

A = 

<*21 

«22 

<*23 

<324 

0 31 

<*32 

0 33 

0 34 

^21  ^22 


an 

<*12 

<*13 

<*14 

A = 

<*21 

<*22 

<*23 

<*24 

a3i 

<*32 

<*33 

<*34 

ri 

r2 

r3 


<*11 

<*12 

<*13 

<*14 

<*21 

<*22 

<*23 

<*24 

= tcl 

<*31 

<*32 

<*33 

<*34 

C2  c3  c4] 


Matrix  Multiplication  by  Columns  and  by  Rows 

Partitioning  has  many  uses,  one  of  which  is  for  finding  particular  rows  or  columns  of  a matrix  product  AB  without 
computing  the  entire  product.  Specifically,  the  following  formulas,  whose  proofs  are  left  as  exercises,  show  how 
individual  column  vectors  of  AB  can  be  obtained  by  partitioning  B into  column  vectors  and  how  individual  row 
vectors  of  AB  can  be  obtained  by  partitioning  A into  row  vectors. 


AB  = A[ bi  b2  • ■ ■ b„]  = [^bi  ^b2  ■ ■ • 4b„] 

(AB  computed  column  by  column) 


al  " 

a \B 

AB  = 

a2 

B = 

*2B 

ara 

am5 

(AB  computed  row  by  row) 


In  words,  these  formulas  state  that 

j th  column  vector  of  AB  = A[j  th  column  vector  of  B ] (8) 


i th  row  vector  of  AB  = [i  th  row  vector  of  A]  B 


(9) 


EXAMPLE  7 Example  5 Revisited 


If  A and  B are  the  matrices  in  Example  5,  then  from  8 the  second  column  vector  of  AB  can  be  obtained 
by  the  computation 


'1  2 

4' 

" 1 ' 
-1 
7 

'27" 

_2  6 

0_ 

-4 

t T 


Second  column  of  B Second  column  of  AB 
and  from  9 the  first  row  vector  of  AB  can  be  obtained  by  the  computation 


[1  2 4] 


4 
3 

5 


— Fi 


jt  1 


[12  27  30  13]  * 

First  row  of  AB  — 


Matrix  Products  as  Linear  Combinations 

We  have  discussed  three  methods  for  computing  a matrix  product  AB — entry  by  entry,  column  by  column,  and  row  by 
row.  The  following  definition  provides  yet  another  way  of  thinking  about  matrix  multiplication. 

r n 


DEFINITION  6 


IfA\,  A 2,  Ar  are  matrices  of  the  same  size,  and  if  c \ , c2,  cr  are  scalars,  then  an  expression  of  the 


form 


c\A\  +C2A2+  ■ • • +crAr 

is  called  a linear  combination  of  A\,  A 2,  Ar  with  coefficients  c \ , ^2, 


J 


To  see  how  matrix  products  can  be  viewed  as  linear  combinations,  let  A be  an  ^ x n matrix  and  x an  ^ x 1 column 
vector,  say 


‘<*11 

<*12 

• - - <*lw" 

■*r 

A = 

<*21 

<*22 

• ' ' <*2n 

and  x = 

*2 

am  1 

<*m2 

<*m« 

*M 

Then 


A = 


’<*11*1 

+ 

<*12*2 

+ ‘ ‘ 

. ^ 

a\nxn 

‘<*n  ‘ 

’<*12  " 

~a\n~ 

<*21*1 

+ 

<*22*2 

+ " • 

. + 

a2  nxn 

=*i 

<*21 

4-*2 

<*22 

+ ’ ' 

' ■ +*« 

a2n 

<*ml*l 

+ 

<*m2*2 

+ * * 

. ^ 

am  1 

am  2 

amn 

(10) 


This  proves  the  following  theorem. 


THEOREM  1.3.1 

If  A is  an  wixn  matrix,  and  if  xis  an  nx  1 column  vector,  then  the  product  Ax  can  be  expressed  as  a linear 
combination  of  the  column  vectors  of  A in  which  the  coefficients  are  the  entries  of  x. 


EXAMPLE  8 Matrix  Products  as  Linear  Combinations 

The  matrix  product 


-1  3 2] 

2" 

r 

1 2 -3 

-1 

= 

-9 

2 1 -2 

3 

-3 

can  be  written  as  the  following  linear  combination  of  column  vectors 


-1 

3 

2 

1 

1 

-1 

2 

+ 3 

-3 

= 

-9 

2 

1 

-2 

-3 

EXAMPLE  9 Columns  of  a Product  AS  as  Linear  Combinations 


We  showed  in  Example  5 that 


"4 

i 

4 

3’ 

’1  2 

4' 

0 

3 

1 



’12  27 

30 

13’ 

-i 

2 6 

0_ 

00 

1 

26 

12_ 

2 

7 

5 

2 

AB  = 


It  follows  from  Formula  6 and  Theorem  1.3.1  that  the  j th  column  vector  of  AB  can  be  expressed  as  a 
linear  combination  of  the  column  vectors  of  A in  which  the  coefficients  in  the  linear  combination  are  the 
entries  from  the  j th  column  of  B.  The  computations  are  as  follows: 

'12 

_ 8 

' 27 

-4 

'30 
26 

'13 
12 


= 4 


+ 0 


+ 2 


= 4 


+ 3 


+ 5 


= 3 


Matrix  Form  of  a Linear  System 

Matrix  multiplication  has  an  important  application  to  systems  of  linear  equations.  Consider  a system  of  m linear 
equations  in  n unknowns: 


<*11*1 

+ 

<*12*2 

+ " 

. . q= 

a\nxn 

= b i 

<*21*1 

+ 

<*22*2 

+ ’ 

. . q= 

a2nxn 

=h 

<*m  1*1 

+ 

<*m  2*2 

+ ‘ 

. . q= 

amnxn 

= bm 

Since  two  matrices  are  equal  if  and  only  if  their  corresponding  entries  are  equal,  we  can  replace  the  m equations  in 
this  system  by  the  single  matrix  equation 


’<*11*1 

+ 

<*12*2 

+ • 

. . q= 

a\  nxn 

V 

<*21*1 

+ 

<*22*2 

+ " 

. . q. 

a2  nxn 

= 

h 

am  1*1 

+ 

<*m2*2 

+ • 

. . q= 

bm 

The  m x 1 matrix  on  the  left  side  of  this  equation  can  be  written  as  a product  to  give 


’<*11  <*12 
<*21  <*22 

• • • <*1m" 

• • • <*2m 

1 

X X 
1 

~b  f 
b 2 

<*ml  am2 

<*m« 

xn 

bm 

If  we  designate  these  matrices  by  A,  x,  and  b,  respectively,  then  we  can  replace  the  original  system  of  m equations  in 
n unknowns  has  been  replaced  by  the  single  matrix  equation 

The  matrix  A in  this  equation  is  called  the  coefficient  matrix  of  the  system.  The  augmented  matrix  for  the  system  is 
obtained  by  adjoining  b to  A as  the  last  column;  thus  the  augmented  matrix  is 


<*ll 

<*12  * • 

a\n 

b l 

<*21 

<*22  ' ' 

a2n 

b 2 

am  1 

<*m2 

amn 

bm 

[A\b]  = 


The  vertical  bar  in  [^|b]  is  a convenient  way  to 
separate  A from  b visually;  it  has  no  mathematical 
significance. 


Transpose  of  a Matrix 

We  conclude  this  section  by  defining  two  matrix  operations  that  have  no  analogs  in  the  arithmetic  of  real  numbers. 


DEFINITION  7 

T 

If  A is  any  mxn  matrix,  then  the  transpose  of  A,  denoted  by  A , is  defined  to  be  the  « x m matrix  that  results 

by  interchanging  the  rows  and  columns  of  A ; that  is,  the  first  column  of  A is  the  first  row  of  A,  the  second 

T 

column  of  A is  the  second  row  of  A , and  so  forth. 


EXAMPLE  10  Some  Transposes 


The  following  are  some  examples  of  matrices  and  their  transposes. 

C=[  1 3 5],  D=[ 4] 


Dl  = [ 4] 


'<211 

ai2 

an 

<*14" 

'2 

3' 

A = 

<*21 

<*22 

<*23 

<*24 

, B = 

1 

4 

<*  31 

<*32 

<*33 

<*34 

5 

6 

AT  = 


1 

& 

£ 

to 

& 

LO 

"l" 

a 12  <*22  <*32 

, Sr= 

'2  1 5' 

, Cr  = 

3 

<*13  <*23  <*33 
<*14  <*24  <*34 

_3  4 6_ 

5 

T T 

Observe  that  not  only  are  the  columns  of  A the  rows  of  A,  but  the  rows  of  A are  the  columns  of  A.  Thus  the  entry  in 

T 

row  i and  column  j of  A is  the  entry  in  row  j and  column  i of  A;  that  is, 


do 


Note  the  reversal  of  the  subscripts. 

In  the  special  case  where  A is  a square  matrix,  the  transpose  of  A can  be  obtained  by  interchanging  entries  that  are 

T 

symmetrically  positioned  about  the  main  diagonal.  In  12  we  see  that  A can  also  be  obtained  by  “reflecting”  A about 
its  main  diagonal. 


(12) 


1 

_2 

4 

1 

_2 

4~ 

1 

3 

-5~ 

A = 

3 

7 

0 

— > 

3 

7 

0 

-+AT  = 

_2 

7 

8 

-5 

8 

6 

-5 

8 

6 

4 

0 

6 

DEFINITION  8 

If  A is  a square  matrix,  then  the  trace  of  A,  denoted  by  tr  (A),  is  defined  to  be  the  sum  of  the  entries  on  the 
main  diagonal  of  A.  The  trace  of  A is  undefined  if  A is  not  a square  matrix. 


J 


James  Sylvester  (1814-1897) 


Arthur  Cayley  (1821-1895) 


The  term  matrix  was  first  used  by  the  English  mathematician  (and  lawyer)  James  Sylvester, 
who  defined  the  term  in  1850  to  be  an  “oblong  arrangement  of  terms.”  Sylvester  communicated  his  work  on 
matrices  to  a fellow  English  mathematician  and  lawyer  named  Arthur  Cayley,  who  then  introduced  some  of 
the  basic  operations  on  matrices  in  a book  entitled  Memoir  on  the  Theory  of  Matrices  that  was  published  in 
1858.  As  a matter  of  interest,  Sylvester,  who  was  Jewish,  did  not  get  his  college  degree  because  he  refused  to 
sign  a required  oath  to  the  Church  of  England.  He  was  appointed  to  a chair  at  the  University  of  Virginia  in  the 
United  States  but  resigned  after  swatting  a student  with  a stick  because  he  was  reading  a newspaper  in  class. 


Sylvester,  thinking  he  had  killed  the  student,  fled  back  to  England  on  the  first  available  ship.  Fortunately,  the 
student  was  not  dead,  just  in  shock! 

[Images:  The  Granger  Collection,  New  York\ 


EXAMPLE  11  Trace  of  a Matrix 


The  following  are  examples  of  matrices  and  their  traces. 


'<*11 

<*12 

<*13" 

A = 

<*21 

<*22 

<*23 

<*31 

<*32 

<*33 

7 0 

—8  4 

7 —3 
1 0 


tr(.<4)  =a\\  + &22  + a33  tr(5)  = — 1 + 5 + 74-0  = 11 


In  the  exercises  you  will  have  some  practice  working  with  the  transpose  and  trace  operations. 


Concept  Review 

Matrix 

Entries 

Column  vector  (or  column  matrix) 

Row  vector  (or  row  matrix) 

Square  matrix 
Main  diagonal 
Equal  matrices 

Matrix  operations:  sum,  difference,  scalar  multiplication 

Linear  combination  of  matrices 

Product  of  matrices  (matrix  multiplication) 

Partitioned  matrices 
Submatrices 
Row-column  method 
Column  method 
Row  method 

Coefficient  matrix  of  a linear  system 
Transpose 


• Trace 

Skills 


Determine  the  size  of  a given  matrix. 

Identify  the  row  vectors  and  column  vectors  of  a given  matrix. 

Perform  the  arithmetic  operations  of  matrix  addition,  subtraction,  scalar  multiplication,  and  multiplication. 
Determine  whether  the  product  of  two  given  matrices  is  defined. 

Compute  matrix  products  using  the  row-column  method,  the  column  method,  and  the  row  method. 

Express  the  product  of  a matrix  and  a column  vector  as  a linear  combination  of  the  columns  of  the  matrix. 
Express  a linear  system  as  a matrix  equation,  and  identify  the  coefficient  matrix. 

Compute  the  transpose  of  a matrix. 

Compute  the  trace  of  a square  matrix. 


Exercise  Set  1.3 

1.  Suppose  that  A,  B,  C,  D,  and  E are  matrices  with  the  following  sizes: 

A B C D E 

(4x5)  (4x5)  (5x2)  (4x2)  (5x4) 

In  each  part,  determine  whether  the  given  matrix  expression  is  defined.  For  those  that  are  defined,  give  the  size 
the  resulting  matrix. 

(a)  BA 

(b)  AC  + & 

(c)  AE  + B 

(d ) AB  + B 

(e)  E(A  + B) 

(f)  E(AC) 

GO  ErA 

(h)  (aT  + E]jD 

Answer: 

(a)  Undefined 

(b)  4 x 2 

(c)  Undefined 

(d)  Undefined 

(e)  5x5 

(f)  5x2 

(g)  Undefined 

(h)  5x2 


2.  Suppose  that  A,  B , C,  D,  and  E are  matrices  with  the  following  sizes: 


A B CD  E 

(3x1)  (3x6)  (6x2)  (2x6)  (1x3) 


In  each  part,  determine  whether  the  given  matrix  expression  is  defined.  For  those  that  are  defined,  give  the  size  of 
the  resulting  matrix. 

(a)  EA 

(b)  ABt 

(c)  BT{a  + ET} 

(d)  2A  | C 

(e)  (cT  -\-  D'jBT 

(f)  CD  + BTET 

(g)  {BDrjCT 

(h)  DC  -|  EA 


3.  Consider  the  matrices 

1 5 2' 
-10  1, 
3 2 4 

In  each  part,  compute  the  given  expression  (where  possible). 

(a)  D + E 

(b)  D-E 

(c)  5 A 

(d)  -TC 

(e)  2 B-C 

(f)  AE-2D 

(g)  -3(D  + 2£) 

(h)  A- A 

(i)  tr (D) 

(j)  ft(D  — 3E) 

(k)  4tr(75) 

(l)  tr (A) 


A = 


3 0 
-1  2 
1 1 


B = 


4 -1 

0 2 


C = 


1 4 2 
3 1 5 


, D = 


1 3 
1 2 
1 3 


Answer: 


(a) 


(b) 


7 

-2 

7 

-5 

0 

-1 


6 5 
1 3 
3 7 

4 -1 
-1  -1 
1 1 


(C) 


15  0 

-5  10 
5 5 


(d) 


—7  -28  -14 
-21  -7  -35 


(e)  Undefined 


(f) 


22  -6  8 

-2  4 6 

10  0 4 


(g)  —39  -21  -24 

9 -6  -15 

-33  -12  -30 

(h)  [0  0‘ 

0 0 
0 0 


(i)  5 

(j)  -25 

(k)  168 

(l)  Undefined 


4.  Using  the  matrices  in  Exercise  3,  in  each  part  compute  the  given  expression  (where  possible). 

(a)  2 AT  + C 

(b)  DT-BT 

(c)  ( D-E)7 

(d)  BT  + 5CT 

(e)  lcr-±A 

2 4 

(f)  B-B7 

(g)  2Et-3Dt 

(h)  (2ET-3DT)jT 

(i)  ( CD)E 

O')  C(BA) 

(k)  X\(DET) 

(l)  tr(BC) 


5.  Using  the  matrices  in  Exercise  3,  in  each  part  compute  the  given  expression  (where  possible). 

(a)  AB 

(b)  BA 

(c)  (3 E)D 

(d)  (AB)C 

(e)  A(BC) 

(f)  CCT 


(g) 

(DAf 

(h) 

(ctb)a 

T 

(i) 

tr  (DDT) 

O') 

tr  (4ET- 

° ) 

(k) 

tr  (CTAT 

+ 2 E 

(1) 

*(K! 

i'-) 

Answer: 


(a) 

' 12 

-3" 

-4 

5 

4 

1 

(b)  Undefined 

(C) 

"42 

108 

75" 

12 

—3 

21 

36 

78 

63 

(d) 

' 3 

45 

9 

11 

-11 

17 

7 

17 

13 

(e) 

' 3 

45 

9 

11 

-11 

17 

7 

17 

13 

(f) 

"21 

17" 

17 

35_ 

(g) 

' 0 

-2 

11" 

12 

1 

8_ 

(h) 

"12 

6 

9 

48 

-20 

14 

24 

8 

16 

(i)  61 

0)  35 

(k)  28 

(1)  99 

6.  Using  the  matrices  in  Exercise  3,  in  each  part  compute  the  given  expression  (where  possible). 

(a)  {2DT -E^A 

(b)  (4B)C  + 2B 

(c)  (-AC)t+5Dt 

(d)  (bat  — 2Cy 


(e)  BT(CCT-ATA)J 

(f)  DTET-{BD)T 

7.  Let 

"3  -2  7]  [6  -2  4" 

A=  6 5 4 and  B=  0 13 

0 4 9j  |_7  7 5 

Use  the  row  method  or  column  method  (as  appropriate)  to  find 

(a)  the  first  row  of  AB. 

(b)  the  third  row  of  AB. 

(c)  the  second  column  of  AB. 

(d)  the  first  column  of  BA. 

(e)  the  third  row  of  AA. 

(f)  the  third  column  of  AA. 

Answer: 

(a)  [67  41  41] 

(b)  [63  67  57] 

(c)  [41" 

21 

67 

(d)  [ 6" 

6 

63 

(e)  [24  56  97] 

(f)  [76- 

98 

97 

8.  Referring  to  the  matrices  in  Exercise  7,  use  the  row  method  or  column  method  (as  appropriate)  to  find 

(a)  the  first  column  of  AB. 

(b)  the  third  column  of  BB. 

(c)  the  second  row  of  BB. 

(d)  the  first  column  of  AA. 

(e)  the  third  column  of  AB. 

(f)  the  first  row  of  BA . 

9.  Referring  to  the  matrices  A and  B in  Exercise  7,  and  Example  9, 

(a)  express  each  column  vectorof  AA  as  a linear  combination  of  the  column  vectors  of  A. 

(b)  express  each  column  vector  of  BB  as  a linear  combination  of  the  column  vectors  of  B. 


Answer: 


10.  Referring  to  the  matrices  A and  B in  Exercise  7,  and  Example  9, 

(a)  express  each  column  vector  of  AB  as  a linear  combination  of  the  column  vectors  of  A. 

(b)  express  each  column  vector  of  BA  as  a linear  combination  of  the  column  vectors  of  B. 

11.  In  each  part,  find  matrices  A,  x,  and  b that  express  the  given  system  of  linear  equations  as  a single  matrix  equation 
Ax  = and  write  out  this  matrix  equation. 

(a)  2xi  -3x2 + 5x3  = 7 

9xi  - *2  + X3  = - 1 

xi  + 5x2  + 4x3  = 0 

(b)  4xi  -3x3+  *4=1 

5xi+  *2  —8x4  = 3 

2xj— 5x2 + 9x3—  *4  = 0 

3x2  — *3  + 7x4  = 2 

Answer: 

(a)  [2  -3  5lr*i]  r T 

9-11  *2  = -1 

1 5 4j|/3j  [ 0 

(b)  [4  0-3  iir*i 

5 1 0 -8  *2 

2-5  9-1  *3 

0 3-1  7_||/4 

12.  In  each  part,  find  matrices  A,  x,  and  b that  express  the  given  system  of  linear  equations  as  a single  matrix  equation 
Ax  = b’  and  write  out  this  matrix  equation. 

(a)  xi -2x2 + 3x3= -3 

2xi  + *2  =0 

— 3x2  + 4x3  = 1 

*1  + X3  = 5 

(b)  3xi + 3x2  + 3x3=  -3 

— x 1 — 5x2  — 2x3  = 3 

— 4x2+  *3  = 0 

13.  In  each  part,  express  the  matrix  equation  as  a system  of  linear  equations. 

(a)  [ 5 6 -7lr*il  [2' 

_1_2  3 *2  = 0 

0 4 -lJH  [3 


'1 

1 

r 

'*l' 

2' 

2 

3 

0 

X2 

= 

2 

5 

-3 

-6 

x3 

-9 

Answer: 

(a)  5xi 

+ 

6x2 

— 

7x3 

-xi 

2x2 

+ 

3x3 

4x2 

— 

X3 

(b)  X! 

+ 

X2 

4= 

X3 

2xi 

4= 

3x2 

5xi 

— 

3x2 

— 

6x3 

14.  In  each  part,  express  the  matrix  equation  as  a system  of  linear  equations. 


(a) 

1 

L»0 

to 

'xf 

2 

4 3 7 

X2 

= 

-1 

-2  1 5 

X3 

4 

(b) 

3-2  0 

r 

"w" 

o' 

5 0 2 

-2 

X 

0 

3 1 4 

7 

7 

0 

-2  5 1 

6 

z 

0 

In  Exercises  15-16,  find  all  values  of  k , if  any,  that  satisfy  the  equation. 


15. 

'1  1 O' 

k 

[*  1 1] 

1 0 2 

1 

CO 

1 

Ov] 

0 

1 

Answer: 


-1 


16. 

O 

CM 

"2‘ 

[2  2 k] 

2 0 3 

2 

1 

CO 

0 

k 

In  Exercises  17-18,  solve  the  matrix  equation  for  a , b , c,  and  d. 


a 3 

4 

d — 2c 

— 1 a-kb 

d + 2c 

-2 

Answer: 


a = 4,  b=  —6,  c=  — 1,  d = 1 


a — b b + a 

"8  r 

3d  -b  c 2d  — c 

7 6_ 

19.  Let  ^ be  any  ^ x n matrix  and  let  0 be  the  yn  x n matrix  each  of  whose  entries  is  zero.  Show  that  if  kA  = 0 ? then 
k = 0 or  A = a 

(a)  Show  that  if  AB  and  BA  are  both  defined,  then  AB  and  are  square  matrices. 

(b)  Show  that  if  A is  an  m x n matrix  and  A{BA)  is  defined,  then  B is  an  n x m matrix. 


21.  Prove:  If  A and  B are  ^ x n matrices,  then  ti*(^4  + E)  = tr(A)  4=  tr(5) . 

(a)  Show  that  if  A has  a row  of  zeros  and  B is  any  matrix  for  which  AB  is  defined,  then  AB  also  has  a row  of 
zeros. 

(b)  Find  a similar  result  involving  a column  of  zeros. 

23.  In  each  part,  find  a 6 x 6 matrix  [azy]  that  satisfies  the  stated  condition.  Make  your  answers  as  general  as  possible 
by  using  letters  rather  than  specific  numbers  for  the  nonzero  entries. 

(a)  3y  = 0 l*J 

(b)  3y  = 0 if  i>j 

(c)  3y  = 0 if  *<J 


(d)  atJ  = 

0 if 

1’- 

1 

Answer: 

(a) 

an 

0 

0 

0 

0 

0 

0 

<*22 

0 

0 

0 

0 

0 

0 

3 33 

0 

0 

0 

0 

0 

0 

344 

0 

0 

0 

0 

0 

0 

355 

0 

0 

0 

0 

0 

0 

366 

(b) 

"an 

312 

3 13 

314 

315 

316 

0 

322 

323 

324 

325 

326 

0 

0 

3 33 

3 34 

3 35 

336 

0 

0 

0 

344 

345 

346 

0 

0 

0 

0 

355 

356 

0 

0 

0 

0 

0 

366 

(c) 

an 

0 

0 

0 

0 

0 

<221 

322 

0 

0 

0 

0 

a3i 

3 32 

3 33 

0 

0 

0 

a4i 

342 

3 43 

344 

0 

0 

<*51 

352 

35  3 

354 

355 

0 

<*61 

362 

36  3 

364 

365 

366 

(d) 

an 

312 

0 

0 

0 

0 

<*21 

322 

323 

0 

0 

0 

0 

332 

333 

3 34 

0 

0 

0 

0 

3 43 

344 

345 

0 

0 

0 

0 

354 

355 

356 

0 

0 

0 

0 

365 

366 

24.  Find  the  4 x 4 matrix  A = [ay]  whose  entries  satisfy  the  stated  condition, 
(a)  ay  = i+j 


(c)  _(  1 if 

^-|-1  if  |i-j|<l 

25.  Consider  the  function  y — j (*)  defined  for  2 x 1 matrices  x by  y — Ax,  where 


Plot/(x)  together  with x in  each  case  below.  How  would  you  describe  the  action  of “/? 

“'■(!) 

(b)  __  f2\ 

[oj 


(c)  x= 


2 


1 - 


r 


26.  Let  / be  the  n x n matrix  whose  entry  in  row  i and  column  j is 

f 1 if  i = j 
jo  if  i*j 

Show  that  AI  = 1A  = A for  every  ^ x n matrix  A. 

27.  How  many  3x3  matrices  A can  you  find  such  that 


~x ' 

'x+7' 

y 

— 

x-y 

z 

0 

for  all  choices  of  x,  y,  and  z? 

Answer: 


1 1 0 

1 -1  0 

0 0 0 


One;  namely,  A = 

28.  How  many  3x3  matrices  A can  you  find  such  that 


~x  ‘ 

A 

y 

= 

0 

z 

0 

for  all  choices  of  x,  y,  and  z? 

29.  A matrix  B is  said  to  be  a square  root  of  a matrix  A if  BE  = A- 

"2  2 

2 2 

(b) 


( a) 

v J Find  two  square  roots  of  A = 


How  many  different  square  roots  can  you  find  of  A = 

(c)  Do  you  think  that  every  2x2  matrix  has  at  least  one  square  root?  Explain  your  reasoning. 


5 0 
0 9 


Answer: 


(a) 

1 f 

and 

"-1  -Y 

1 1 

-1  -1 

(b)  F 

Four; 

{ 5 0j 

~(5  Oj 

{l  0 

-{5  0 

0 3 ’ 

0 3 ’ 

0 -3  ’ 

0 -3 

30.  Let  0 denote  a 2 x 2 matrix,  each  of  whose  entries  is  zero. 

(a)  Is  there  a 2 x 2 matrix  A such  that  A*  0 and  AA  = Q?  Justify  your  answer. 

(b)  Is  there  a 2 x 2 matrix  A such  that  A*  0 an(i  AA  = A ? Justify  your  answer. 


True-False  Exercises 


In  parts  (a)-(o)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

^ The  matrix 

Answer: 

True 

(b)  An^x«  matrix  has  m column  vectors  and  n row  vectors. 

Answer: 

False 

(c)  If  A and  B are  2 x 2 matrices,  then  AB  — £A- 
Answer: 

False 

(d)  The  i th  row  vector  of  a matrix  product  AB  can  be  computed  by  multiplying  A by  the  ith  row  vector  of  B. 
Answer: 

False 

^ For  every  matrix  ^4,  it  is  true  that  = A. 

Answer: 

True 

(f)  If  A and  B are  square  matrices  of  the  same  order,  then  tr  (AB)  = tr(^4)tr(5) . 

Answer: 

False 

(g)  If  A and  B are  square  matrices  of  the  same  order,  then  (AB)  ^ = A1  B1 . 

Answer: 

False 

(h)  por  ever y square  matrix  A,  it  is  true  that  tr  (-4  ^ J = tr  (-4) . 

Answer: 

True 

(*)  If  A is  a 6 x 4 matrix  and  B is  an  ^ x n matrix  such  that  BTAT  is  a 2 x 6 matrix,  then  m = A and  n = 2- 


"1  2 3" 
4 5 6 


has  no  main  diagonal. 


Answer: 


True 

(j)  If  A is  an  n x n matrix  and  c is  a scalar,  then  tr(c^4)  = c tr(^4) . 

Answer: 

True 

(k)  If  A,  B,  and  C are  matrices  of  the  same  size  such  that  A — C = B — C->  then  A = B- 
Answer: 

True 

(l)  If  A,  B,  and  C are  square  matrices  of  the  same  order  such  that  AC  = BC->  then  A = 27- 
Answer: 

False 

(m)  If  AB  | BA  is  defined,  then  A and  B are  square  matrices  of  the  same  size. 

Answer: 

True 

(n)  If  B has  a column  of  zeros,  then  so  does  AB  if  this  product  is  defined. 

Answer: 

True 

(o)  If  B has  a column  of  zeros,  then  so  does  BA  if  this  product  is  defined. 

Answer: 

False 


Copyright  © 2010  John  Wiley  & Sons,  Inc.  All  rights  reserved. 


1.4  Inverses;  Algebraic  Properties  of  Matrices 

In  this  section  we  will  discuss  some  of  the  algebraic  properties  of  matrix  operations.  We  will  see  that  many  of 
the  basic  rules  of  arithmetic  for  real  numbers  hold  for  matrices,  but  we  will  also  see  that  some  do  not. 


Properties  of  Matrix  Addition  and  Scalar  Multiplication 

The  following  theorem  lists  the  basic  algebraic  properties  of  the  matrix  operations. 


Properties  of  Matrix  Arithmetic 

Assuming  that  the  sizes  of  the  matrices  are  such  that  the  indicated  operations  can  be  performed,  the 
following  rules  of  matrix  arithmetic  are  valid. 

A 4-  B = B + A (Commutative  law  for  addition) 

(]j)  A + (5  + C)  = (A  + B)  + C (Associative  law  for  addition) 
fcj  A(BC)  = (AB)C  (Associative  law  for  multiplication) 

(P)  A(B  + C)  = AB  + AC  (Left  distributive  law) 

(B  -H  C)A  = BA  4-  CA  (Right  distributive  law) 

(p  A(B  — C)=AB  — AC 
(g)  ( B-C)A  = BA-CA 
(hj  a(B  + C)=aB  + aC 
(i)  a{B  — C)=aB  — aC 
q)  (a  + b)C  = aCA-bC 

(k)  (a  — b)C  = aC  — bC 

(l)  a(bC)  = (ab)C 

(m)  a(BC)  = (aB)C  = B(aC) 


To  prove  any  of  the  equalities  in  this  theorem  we  must  show  that  the  matrix  on  the  left  side  has  the  same  size 
as  that  on  the  right  and  that  the  corresponding  entries  on  the  two  sides  are  the  same.  Most  of  the  proofs  follow 
the  same  pattern,  so  we  will  prove  part  (d)  as  a sample.  The  proof  of  the  associative  law  for  multiplication  is 
more  complicated  than  the  rest  and  is  outlined  in  the  exercises. 

There  are  three  basic  ways  to  prove  that  two 
matrices  of  the  same  size  are  equal — prove  that 
corresponding  entries  are  the  same,  prove  that 
corresponding  row  vectors  are  the  same,  or 
prove  that  corresponding  column  vectors  are 
the  same. 


Proof  (d)  We  must  show  that  A(B  4 C)  and  AB  4 AC  have  the  same  size  and  that  corresponding  entries 
are  equal.  To  form  A(B  4 C),  the  matrices  B and  C must  have  the  same  size,  say  mxn,  and  the  matrix  A 
must  then  have  m columns,  so  its  size  must  be  of  the  form  rxm-  This  makes  A(B  4 C)  an  rxn  matrix.  It 
follows  that  AB  \ AC  is  also  an  rxn  matrix  and,  consequently,  A(B  4 C)  and  AB  | AC  have  the  same  size. 

Suppose  that  A = ] , B = [&y  ] ,and  C = [c,y  ] . We  want  to  show  that  corresponding  entries  of 

A(B  4 C)  and  AB  4 AC  are  equal;  that  is, 

[A(B  + C)]ij=[AB  + AC]ij 

for  all  values  of  i and  j.  But  from  the  definitions  of  matrix  addition  and  matrix  multiplication,  we  have 

[A(B+C)]jj  = an  (b\j -H  Ciy)  + “b  c2j)  H"  " " " ^ ^ cmj) 

= (flub l j + "b  " " " + + ai2c2j  + " " " + aimcmj ) 

= [AB]iJ+[AC]iJ=[AB  + AC]iJ 


Although  the  operations  of  matrix  addition  and  matrix  multiplication  were  defined  for  pairs  of 
matrices,  associative  laws  ( b ) and  (c)  enable  us  to  denote  sums  and  products  of  three  matrices  as  A \ B | C 
and  ABC  without  inserting  any  parentheses.  This  is  justified  by  the  fact  that  no  matter  how  parentheses  are 
inserted,  the  associative  laws  guarantee  that  the  same  end  result  will  be  obtained.  In  general,  given  any  sum  or 
any  product  of  matrices,  pairs  of parentheses  can  be  inserted  or  deleted  anywhere  within  the  expression 
without  affecting  the  end  result. 


EXAMPLE  1 Associativity  of  Matrix  Multiplication 


As  an  illustration  of  the  associative  law  for  matrix  multiplication,  consider 

1 2' 


Then 


Thus 


and 


AB  = 


A = 


3 4 
0 1 


B = 


4 3 
2 1 


C = 


1 0 
2 3 


"1  2' 

00 

3 4 
0 1 

4 3 
_2  1 

= 

20  13 
2 1 

and  BC  = 


"4  3' 

1 

O 

'10  9' 

i 

CM 

_2  3_ 

. 4 3. 

(AB)  C = 

00  o 

CM 

1  

5 

13 

'1 

2 

O' 

3 

= 

l 

OO 

15' 

39 

2 

1 

4 

3 

A(BC)  = 

'1  2' 

3 4 

'10  9' 
4 3 

= 

'18  15' 
46  39 

0 1 

4 3 

so  ( AB)C  = A(BC),  as  guaranteed  by  Theorem  1.4.1(c). 


Properties  of  Matrix  Multiplication 


Do  not  let  Theorem  1.4.1  lull  you  into  believing  that  all  laws  of  real  arithmetic  carry  over  to  matrix 
arithmetic.  For  example,  you  know  that  in  real  arithmetic  it  is  always  true  that  ab  = ba,  which  is  called  the 
commutative  law  for  multiplication.  In  matrix  arithmetic,  however,  the  equality  of  AB  and  BA  can  fail  for 
three  possible  reasons: 

AB  may  be  defined  and  BA  may  not  (for  example,  if  A is  2 x 3 and  B is  3 x 4)- 

AB  and  BA  may  both  be  defined,  but  they  may  have  different  sizes  (for  example,  if  A is  2 x 3 and  B is 

3x2). 

AB  and  BA  may  both  be  defined  and  have  the  same  size,  but  the  two  matrices  may  be  different  (as 
illustrated  in  the  next  example). 

Do  not  read  too  much  into  Example  2 — it  does 
not  rule  out  the  possibility  that  AB  and  BA  may 
be  equal  in  certain  cases,  just  that  they  are  not 
equal  in  all  cases.  If  it  so  happens  that 
A£  = BA->  then  we  say  that  AB  and  BA 
commute. 


EXAMPLE  2 Order  Matters  in  Matrix  Multiplication 


Consider  the  matrices 


Multiplying  gives 


A 


-1  0 
2 3 


and  B = 


2 

0 


and  BA  = 


3 6 
-3  0 


Thus,  ab  * BA- 


Zero  Matrices 


A matrix  whose  entries  are  all  zero  is  called  a zero  matrix.  Some  examples  are 


0 0 

0 0 ’ 


0 0 0 

0 0 0, 

0 0 0 


0 0 0 0 

0 0 0 0’ 


0 

0 

0 

0 


. [0] 


We  will  denote  a zero  matrix  by  0 unless  it  is  important  to  specify  its  size,  in  which  case  we  will  denote  the 
m x n zero  matrix  by  0mxM. 


It  should  be  evident  that  if  A and  0 are  matrices  with  the  same  size,  then 

A+0=0+A=A 

Thus,  0 play  s the  same  role  in  this  matrix  equation  that  the  number  0 plays  in  the  numerical  equation 
c2d“0  = 0+  <2=r2- 


The  following  theorem  lists  the  basic  properties  of  zero  matrices.  Since  the  results  should  be  self-evident,  we 
will  omit  the  formal  proofs. 


Properties  of  Zero  Matrices 

If  c is  a scalar,  and  if  the  sizes  of  the  matrices  are  such  that  the  operations  can  be  perfomed,  then: 

(a)  AAs0  = 0A-A  = A 

(b)  A-0  = A 

(c)  A-A  = A+(-A)  = 0 

(d)  0A=0 

(e)  If  cA  = 0,  then  c = 0 or  A = 0- 


Since  we  know  that  the  commutative  law  of  real  arithmetic  is  not  valid  in  matrix  arithmetic,  it  should  not  be 
surprising  that  there  are  other  rules  that  fail  as  well.  For  example,  consider  the  following  two  laws  of  real 
arithmetic: 

If  ab  = be  and  a ^ 0>  then  b = c ■ [The  cancellation  law] 

If  ab  = 0)  then  at  least  one  of  the  factors  on  the  left  is  0. 

The  next  two  examples  show  that  these  laws  are  not  universally  true  in  matrix  arithmetic. 

EXAMPLE  3 Failure  of  the  Cancellation  Law 


Consider  the  matrices 


A = 


0 

0 


5 

4 


We  leave  it  for  you  to  confirm  that 


AB  = AC  = 


3 4 
6 8 


Although  A * 0,  canceling  A from  both  sides  of  the  equation  AB  = AC  would  lead  to  the 
incorrect  conclusion  that  B = C-  Thus,  the  cancellation  law  does  not  hold,  in  general,  for  matrix 
multiplication. 


EXAMPLE  4 A Zero  Product  with  Nonzero  Factors 


Here  are  two  matrices  for  which  AB  = 0 , but  0 and  B ^ 0- 


A = 


0 1 
0 2 ’ 


7 

0 


Identity  Matrices 


A square  matrix  with  1 's  on  the  main  diagonal  and  zeros  elsewhere  is  called  an  identity  matrix.  Some 
examples  are 


m. 


1 0 

0 1 ’ 


1 0 0 

0 10, 

0 0 1 


10  0 0 
0 10  0 
0 0 10 
0 0 0 1 


An  identity  matrix  is  denoted  by  the  letter  I.  If  it  is  important  to  emphasize  the  size,  we  will  write  In  for  the 
n x n identity  matrix. 


To  explain  the  role  of  identity  matrices  in  matrix  arithmetic,  let  us  consider  the  effect  of  multiplying  a general 
2x3  matrix  A on  each  side  by  an  identity  matrix.  Multiplying  on  the  right  by  the  3 x 3 identity  matrix  yields 


AI3  = 


an 

*21 


<*12  *13 
*22  * 23 


1 0 0 
0 1 0 
0 0 1 


an  *12  a 12 

<*21  *22  *23 


and  multiplying  on  the  left  by  the  2x2  identity  matrix  yields 

hA  = 


'1 

o' 

~an 

a12 

<3 13' 

'O’  11 

*12 

*13' 

_0 

1_ 

_«21 

“22 

«23_ 

_<*21 

*22 

*23  _ 

= A 


The  same  result  holds  in  general;  that  is,  if  A is  any  ^ x n matrix,  then 

Aln  = A and  lmA  = A 

Thus,  the  identity  matrices  play  the  same  role  in  these  matrix  equations  that  the  number  1 plays  in  the 
numerical  equation  a • 1 = 1 • a = a- 


As  the  next  theorem  shows,  identity  matrices  arise  naturally  in  studying  reduced  row  echelon  forms  of  square 
matrices. 


THEOREM  1.4.3 

If  R is  the  reduced  row  echelon  form  of  an  n x n matrix  A,  then  either  R has  a row  of  zeros  or  R is  the 
identity  matrix  In. 


Suppose  that  the  reduced  row  echelon  form  of  A is 


>11 

n 2 

. . . 

r\n 

R = 

r2\ 

•••  to 
to 

nn 

rn  1 

rn2 

• . . 

r nn 

Either  the  last  row  in  this  matrix  consists  entirely  of  zeros  or  it  does  not.  If  not,  the  matrix  contains  no  zero 
rows,  and  consequently  each  of  the  n rows  has  a leading  entry  of  1.  Since  these  leading  l's  occur 
progressively  farther  to  the  right  as  we  move  down  the  matrix,  each  of  these  l's  must  occur  on  the  main 
diagonal.  Since  the  other  entries  in  the  same  column  as  one  of  these  l's  are  zero,  R must  be  /„.  Thus,  either  R 
has  a row  of  zeros  or  R = ln. 


Inverse  of  a Matrix 

In  real  arithmetic  every  nonzero  number  a has  a reciprocal  a -1  ( = 1 / a)  with  the  property 

a ■ =a~^  • a = 1 

The  number  is  sometimes  called  the  multiplicative  inverse  of  a.  Our  next  objective  is  to  develop  an 
analog  of  this  result  for  matrix  arithmetic.  For  this  purpose  we  make  the  following  definition. 


DEFINITION  1 

If  A is  a square  matrix,  and  if  a matrix  B of  the  same  size  can  be  found  such  that  AB  = BA  = L then  A 
is  said  to  be  invertible  (or  nonsingular ) and  B is  called  an  inverse  of  A.  If  no  such  matrix  B can  be 
found,  then  A is  said  to  be  singular. 


The  relationship  AB  = BA  = / is  not  changed  by  interchanging  A and  5,  so  if  A is  invertible  and  B 
is  an  inverse  of  A,  then  it  is  also  true  that  B is  invertible,  and  A is  an  inverse  of  B.  Thus,  when 


AB  = BA  = l 

we  say  that  A and  B are  inverses  of  one  another. 


EXAMPLE  5 An  Invertible  Matrix 


Let 


and  B = 


5 

2 


Then 


AB 

BA 


2 • 

-5' 

'3  5' 

'1  O' 

-1 

3_ 

1 2 

.0  1_ 

m 

CO 

2 -5' 

'1  O' 

_!  2_ 

-1  3_ 

.0 

Thus,  A and  B are  invertible  and  each  is  an  inverse  of  the  other. 


EXAMPLE  6 Class  of  Singular  Matrices 


In  general,  a square  matrix  with  a row  or  column  of  zeros  is  singular.  To  help  understand  why 
this  is  so,  consider  the  matrix 


A = 


1 4 

2 5 

3 6 


0 

0 

0 


To  prove  that  A is  singular  we  must  show  that  there  is  no  3 x 3 matrix  B such  that  AB  = BA  = / 
. For  this  purpose  let  c i , C2,  0 be  the  column  vectors  of  A.  Thus,  for  any  3x3  matrix  B we 
can  express  the  product  BA  as 

BA  = B[c\  C2  0]  = [5c i Z?C2  0]  [Formula  (6)  of  Section  1.3] 

The  column  of  zeros  shows  that  BA  * / and  hence  that  A is  singular. 


Properties  of  Inverses 

It  is  reasonable  to  ask  whether  an  invertible  matrix  can  have  more  than  one  inverse.  The  next  theorem  shows 
that  the  answer  is  no — an  invertible  matrix  has  exactly  one  inverse. 


THEOREM  1.4.4 

If  B and  C are  both  inverses  of  the  matrix  A,  then  B = C- 


Since  B is  an  inverse  of  A,  we  have  BA  = /•  Multiplying  both  sides  on  the  right  by  C gives 
{BA)  C = 1C  = C.  But  it  is  also  true  that  {BA)  C = B{AC)  = Bl  = B,  so  Q = B- 


As  a consequence  of  this  important  result,  we  can  now  speak  of  “the”  inverse  of  an  invertible  matrix.  If  A is 
invertible,  then  its  inverse  will  be  denoted  by  the  symbol  J[  ■ Thus, 

AA~X=l  and  A~lA  = I 


(1) 


The  inverse  of  A plays  much  the  same  role  in  matrix  arithmetic  that  the  reciprocal  a * plays  in  the  numerical 
relationships  aa~^  = 1 and  a~^a  = 1- 

In  the  next  section  we  will  develop  a method  for  computing  the  inverse  of  an  invertible  matrix  of  any  size. 
For  now  we  give  the  following  theorem  that  specifies  conditions  under  which  a 2 x 2 matrix  is  invertible  and 
provides  a simple  formula  for  its  inverse. 


THEOREM  1.4.5 

The  matrix 

A = \a  h 
[c  d_ 

is  invertible  if  and  only  if  ad  — be  ^ 0?  m which  case  the  inverse  is  given  by  the  formula 

d -h\ 
ad  — be  |_  ~ c a _ 


(2) 


We  will  omit  the  proof,  because  we  will  study  a more  general  version  of  this  theorem  later.  For  now,  you 
should  at  least  confirm  the  validity  of  Formula  2 by  showing  that  AA  _1  = A “1 A = l 


The  formula  for  A 1 given  in  Theorem  1 .4.5  first  appeared  (in  a more  general 
form)  in  Arthur  Cayley's  1858  Memoir  on  the  Theory  of  Matrices.  The  more  general  result  that 
Cayley  discovered  will  be  studied  later. 


The  quantity  ad  — be  in  Theorem  1.4.5  is 
called  the  determinant  of  the  2x2  matrix  A 
and  is  denoted  by 

det(-d)  = ad  — be 


or  alternatively  by 


a 

e 


b 

d 


= ad  — be 


Figure  1.4.1  illustrates  that  the  determinant  of  a 2 x 2 matrix^  is  the  product  of  the  entries  on  its 
main  diagonal  minus  the  product  of  the  entries  off  its  main  diagonal.  In  words,  Theorem  1.4.5  states  that  a 
2x2  matrix  A is  invertible  if  and  only  if  its  determinant  is  nonzero,  and  if  invertible,  then  its  inverse  can  be 
obtained  by  interchanging  its  diagonal  entries,  reversing  the  signs  of  its  off-diagonal  entries,  and  multiplying 
the  entries  by  the  reciprocal  of  the  determinant  of  A. 


= ad  - be 


det(A)  = 


\y y 


Figure  1.4.1 


EXAMPLE  7 Calculating  the  Inverse  of  a 2 x 2 Matrix 


In  each  part,  determine  whether  the  matrix  is  invertible.  If  so,  find  its  inverse. 


<a>,4= 
<b>,4  = 


6 1 

5 2_ 

-1  2 

3 -6 


Solution 


The  determinant  of  A is  det(j4)  = (6)  (2)  — (1)  (5)  = 7,  which  is  nonzero.  Thus,  A is 
invertible,  and  its  inverse  is 


A~'=± 


2 

-5 


-1 

6 


2 

7 

5 

'7 


I 

'7 

6 

7 


We  leave  it  for  you  to  confirm  that  AA  ^ = A ^ A = I- 
) The  matrix  is  not  invertible  since  det(y4)  = ( — !)(— 6)  — (2)(3)  = 0. 


EXAMPLE  8 Solution  of  a Linear  System  by  Matrix  Inversion 


A problem  that  arises  in  many  applications  is  to  solve  a pair  of  equations  of  the  form 

u = ax  + by 

v =cx  + dy 


for  x and  y in  terms  of  it  and  v.  One  approach  is  to  treat  this  as  a linear  system  of  two  equations  in  the 
unknowns  x and  y and  use  Gauss-Jordan  elimination  to  solve  for  x and  y.  However,  because  the 
coefficients  of  the  unknowns  are  literal  rather  than  numerical,  this  procedure  is  a little  clumsy.  As  an 
alternative  approach,  let  us  replace  the  two  equations  by  the  single  matrix  equation 


r«i  - 

ax  + by 

LvJ- 

cx  A-dy 

which  we  can  rewrite  as 


\u]  - 

b 

~x~ 

kli 

[e  d 

y 

If  we  assume  that  the  2x2  matrix  is  invertible  (i.e.,  ad  — be  0)-  then  we  can  multiply  through  on 
the  left  by  the  inverse  and  rewrite  the  equation  as 


which  simplifies  to 


a b~\ 

'Ml 

[a  b 

-1 

a 

b 

~x~ 

c J J 

w-| 

[c  d 

c 

d 

y 

'a  b~\ 

Val  \x- 

c d J 

LVJ  “ [y. 

Using  Theorem  1.4.5,  we  can  rewrite  this  equation  as 


from  which  we  obtain 


1 

d 

-b]\ 

“l-N 

ad  — be 

—c 

a\[ 

du  — 

bv 

av  — cu 

X = , 

ad  — 

be  ’ 

y = 

ad  — be 

The  next  theorem  is  concerned  with  inverses  of  matrix  products. 

THEOREM  1.4.6 

If  A and  B are  invertible  matrices  with  the  same  size,  then  AB  is  invertible  and 

(AB)~l  =B~lA~l 


We  can  establish  the  invertibility  and  obtain  the  stated  formula  at  the  same  time  by  showing  that 


(AB)  (5 -1^ _1 ) = (B (AB)  = I 

But 

(AB)  IB  _1  A _1 ) = A (BB  _1  Y _1  = A! A _1  = AA  _1  = I 
and  similarly,  (b  ^ A 1 'j(AB)  = /. 

Although  we  will  not  prove  it,  this  result  can  be  extended  to  three  or  more  factors: 

A product  of  any  number  ofinvertible  matrices  is  invertible,  and  the  inverse  of  the  product  is  the 
product  of  the  inverses  in  the  reverse  order 


EXAMPLE  9 The  Inverse  of  a Product 


Consider  the  matrices 


A = 


1 2 

1 3 


B = 


3 2 

2 2 


We  leave  it  for  you  to  show  that 

AB  = 


and  also  that 


3 -2 

-1  1 


7 6 
9 8 


5-1  = 


1 -1 

I 


(■ AB)~'  = 


4 -3 
9 7 


1 _ 

1 -1 

3 -2' 

4 -3 

1 

CO|Csl 

i — • 

1 

i 

-1  1_ 

9 7 

2 2. 

Thus,  =£~1A~l  as  guaranteed  by  Theorem  1.4.6. 


Powers  of  a Matrix 

If  A is  a square  matrix,  then  we  define  the  nonnegative  integer  powers  of  A to  be 

Al*  = 1 and  An  = AA • • • A [n  factors] 
and  if  A is  invertible,  then  we  define  the  negative  integer  powers  of  A to  be 

A~n  = {a~^''i  = A~^A~^  • • • A~l  [//factors] 

Because  these  definitions  parallel  those  for  real  numbers,  the  usual  laws  of  nonnegative  exponents  hold;  for 
example, 

ArAs  = Ar+s  and  ( Ar)s  = Ars 

If  a product  of  matrices  is  singular,  then  at  least 
one  of  the  factors  must  be  singular.  Why? 

In  addition,  we  have  the  following  properties  of  negative  exponents. 

THEOREM  1.4.7 

If  A is  invertible  and  n is  a nonnegative  integer,  then: 

(a)  £ -1  is  invertible  and  J = A. 


ft)  An  is  invertible  and  (41”)  1 = A n = (a  1 J . 

(c)  kA  is  invertible  for  any  nonzero  scalar  k,  and  (kAi)  =k~^A~^ 


We  will  prove  part  (c)  and  leave  the  proofs  of  parts  (a)  and  ( b ) as  exercises. 

Properties  (c)  and  ( m ) in  Theorem  1.4.1  imply  that 

(kA)  (k _1  A "1 J = k _1  (kA)A  -1  = ~xk J AA  _1  = ( 1 )I  = I 

and  similarly,  ^ ^ A * J = (kA)  = / Thus,  kA  is  invertible  and  (£j4)  = k A . 


EXAMPLE  10  Properties  of  Exponents 

Let  A and  _t  be  the  matrices  in  Example  9;  that  is, 


A = 


1 2 
1 3 


and  A 1 = 


3 -2 

-1  1 


Then 


Also, 


41  "3  = 


(a-1? - 

3 -21 

3 —2 

C\] 

1 

CO 

1 

41 

-30" 

(A  } ~ 

-1  1 

-1  1 

-1  1_ 

-15 

11_ 

4l3  = 


C\] 

1 — • 

1 

1 

ISJ 

"1  2" 

1 

Lk) 

O 

1 

oo 

LkJ 

_1  3_ 

15  41_ 

so,  as  expected  from  Theorem  1.4.7(A), 
‘-1  1 


(4 - 


( 1 1 ) (4 1 ) — (30) (15) 


41  -30 
-15  11 


41 

-15 


-30 

11 


=K‘f 


EXAMPLE  11  The  Square  of  a Matrix  Sum 

In  real  arithmetic,  where  we  have  a commutative  law  for  multiplication,  we  can  write 

(a  *b)2  = a2  *ab  *ba*b2  = a2  *ab  *ab  *b2  = a2  * 2 ab  *b2 

However,  in  matrix  arithmetic,  where  we  have  no  commutative  law  for  multiplication,  the  best 
we  can  do  is  to  write 

(A*  B)2  = A2  * AB  * BA*  B2 

It  is  only  in  the  special  case  where  A and  B commute  (i.e.,  AB  = BA)  that  we  can  go  a step 
further  and  write 

(A*B)2  = A2*  2AB*B2 


Matrix  Polynomials 

If  A is  a square  matrix,  say  nxn,  and  if 

9 m 

p(x)  =ao  + aix  + ct2X  + • • • +amx 
is  any  polynomial,  then  we  define  the  « x n matrix  p(A)  to  be 

p(A)  = a$l  ct2-A^  •¥-  • • • +amAm  (3) 

where  I is  the  « x « identity  matrix;  that  is,  p(A)  is  obtained  by  substituting  A for  x and  replacing  the  constant 
term  3q  by  the  matrix  a^I.  An  expression  of  form  3 is  called  a matrix  polynomial  in  A. 

EXAMPLE  12  A Matrix  Polynomial 


Find  p(A)  for 


p(x)  —x^  — 2x  — 3 and  A — 


-1  2 
0 3 


Solution 

p(A)  = A2  — 2A  — 31 


'-l  : 

2' 

2 

-2 

'-1 

2 

-3 

'1  O' 

o : 

3 

0 

i 3 

0 

1_ 

1 4 

-2 

4 

3 0 

O 

o 

i 

o 

■o 
1 

0 

6 

0 3 

o 

o 

or  more  briefly,  p (^4)  = 0. 


It  follows  from  the  fact  that  ArAs  = ArJr5  = A5^r  = A5Ar  that  powers  of  a square  matrix 
commute,  and  since  a matrix  polynomial  in  A is  built  up  from  powers  of  A,  any  two  matrix  polynomials  in  A 
also  commute;  that  is,  for  any  polynomials  p\  and p2  we  have 


pl(A)p2(A)=p2(A)pl(A) 


(4) 


Properties  of  the  Transpose 


The  following  theorem  lists  the  main  properties  of  the  transpose. 


THEOREM  1.4.8 

If  the  sizes  of  the  matrices  are  such  that  the  stated  operations  can  be  performed,  then: 


(b)  ( A + B)T  = AT  + BT 

(c)  (A-B)T  = AT -BT 

(d)  (kA)T  = kAT 

(e)  (AB)T  = BTAT 


If  you  keep  in  mind  that  transposing  a matrix  interchanges  its  rows  and  columns,  then  you  should  have  little 
trouble  visualizing  the  results  in  parts  (< a)-(d ).  For  example,  part  {a)  states  the  obvious  fact  that  interchanging 
rows  and  columns  twice  leaves  a matrix  unchanged;  and  part  ( b ) states  that  adding  two  matrices  and  then 
interchanging  the  rows  and  columns  produces  the  same  result  as  interchanging  the  rows  and  columns  before 
adding.  We  will  omit  the  formal  proofs.  Part  ( e ) is  a less  obvious,  but  for  brevity  we  will  omit  its  proof  as 
well.  The  result  in  that  part  can  be  extended  to  three  or  more  factors  and  restated  as: 


The  transpose  of  a product  of  any  number  of  matrices  is  the  product  of  the  transposes  in  the  reverse 
order. 


The  following  theorem  establishes  a relationship  between  the  inverse  of  a matrix  and  the  inverse  of  its 
transpose. 


THEOREM  1.4.9 


T 

If  A is  an  invertible  matrix,  then  A is  also  invertible  and 


We  can  establish  the  invertibility  and  obtain  the  formula  at  the  same  time  by  showing  that 
But  from  part  ( e ) of  Theorem  1 .4.8  and  the  fact  that  / ^ = /,  we  have 


=lT=I 


at[a-')t  = (a-'a)t 
(a-')tat  = [aa-1)t=it=i 


which  completes  the  proof. 


EXAMPLE  13  Inverse  of  a Transpose 

Consider  a general  2x2  invertible  matrix  and  its  transpose: 


A = 

'a  b 

and  "1 

c d 

[b  d 

T 

Since  A is  invertible,  its  determinant  ad  — be  is  nonzero.  But  the  determinant  of  A is  also 
ad  — be  (verify)?  so  A is  also  invertible.  It  follows  from  Theorem  1 .4.5  that 

d c 


K)"‘  ■ 


ad  — be 

b 

ad  — be 


ad  — be 

a 

ad  — be 


which  is  the  same  matrix  that  results  if  A 1 is  transposed  (verify).  Thus, 

Kr‘=Kt 

as  guaranteed  by  Theorem  1.4.9. 


Concept  Review 

Commutative  law  for  matrix  addition 

Associative  law  for  matrix  addition 

Associative  law  for  matrix  multiplication 

Left  and  right  distributive  laws 

Zero  matrix 

Identity  matrix 

Inverse  of  a matrix 

Invertible  matrix 

Nonsingular  matrix 

Singular  matrix 

Determinant 

Power  of  a matrix 


Matrix  polynomial 

Skills 

Know  the  arithmetic  properties  of  matrix  operations. 

Be  able  to  prove  arithmetic  properties  of  matrices. 

Know  the  properties  of  zero  matrices. 

Know  the  properties  of  identity  matrices. 

Be  able  to  recognize  when  two  square  matrices  are  inverses  of  each  other. 

Be  able  to  determine  whether  a 2 x 2 matrix  is  invertible. 

Be  able  to  solve  a linear  system  of  two  equations  in  two  unknowns  whose  coefficient  matrix  is 
invertible. 

Be  able  to  prove  basic  properties  involving  invertible  matrices. 

Know  the  properties  of  the  matrix  transpose  and  its  relationship  with  invertible  matrices. 


Exercise  Set  1 .4 

1.  Let 


2 -1  3" 

CO 

1 

LO 

1 

Ul 

i 

'0  -2  3' 

0 4 5 

, B = 

0 1 2 

, c = 

1 7 4 

-2  1 4 

4-7  6 

3 5 9 

Show  that 

(a)  A+(B  + C)  = (A  + B)  + C 

(b)  (AB)C  = A(BC) 

(C)  ( a + b)C  = aC  + bC 
(d)  a(B—C)=aB  — aC 

2.  Using  the  matrices  and  scalars  in  Exercise  1,  verify  that 

(a)  a(BC)  = (aB)C  = B(aC) 

(b)  A(B-C)=AB-AC 
(C)  (B  + C)A  = BA  + CA 
(d)  < bC)  = (ab)C 

3.  Using  the  matrices  and  scalars  in  Exercise  1,  verify  that 


(b)  ( A + B)T  = AT  + BT 

(c)  ( aC)T  = aCT 

(d)  (AB)T  = BTAT 


In  Exercises  4-7  use  Theorem  1.4.5  to  compute  the  inverses  of  the  following  matrices. 


II 

1 

OJ 

1 

L5  2j 

II 

* r* 

'2  -3 

Answer: 


B~l 


1 J_ 

5 20 

I J_ 

"5  10 


6. 


C = 


7. 


D = 


2 0 
0 3 


Answer: 


D~l 


8.  Find  the  inverse  of 


9.  Find  the  inverse  of 


cos  6 sin  9 
—sin  6 cos  0 

2(e  +e  ) 2^  ) 
2(e  ~e  ) 2^  +e  ) 


Answer: 


1 , —X\  1 fX  -X\ 

2(e  +e  ) "2^  ) 

~2(e  ~e  ) 2<  + * ) 


1®'  Use  the  matrix  ^ in  Exercise  4 to  verify  that 
Use  the  matrix  B in  Exercise  5 to  verify  that 


K)~,=K‘) 

(*T'=(0 


T 

T 


12.  Use  the  matrices  A and  B in  4 and  5 to  verify  that  ( AB ) 1 = 5 1 A 1 


13.  Use  the  matrices  A,  B,  and  C in  Exercises  4-6  to  verify  that  (ABC)  1 = C ^ B 1 . 
In  Exercises  14-17,  use  the  given  information  to  find  A. 


U'A-'  = 


2 -1 

3 5 


15‘  (1A) -1  = 


-3  7 

1 -2 


Answer: 


A = 


1 1 

1 1 


16. 


17'  (/  + 2 A) -1 


-3  -1 

5 2 

-1  2 

4 5 


Answer: 

__2_  J_ 

13  13 

2_  _6_ 

13  13 

18.  Let  A be  the  matrix 

2 0 

A 1 

In  each  part,  compute  the  given  quantity. 

(a)  A 3 

(b)  A~2 

(c)  A2-2A  + I 

(d)  p(A),  where  p(x)  = x — 2 

(e)  p(A),  where  p(x)  = 2xA  — x + 1 

(f)  p(A),  where  =x2  — 2x  + 4 

19.  Repeat  Exercise  18  for  the  matrix 


Answer: 


(a) 

'41  15' 

30  11 

(b) 

11  - 

-30 

(c) 

'6  2' 

A 2_ 

(d) 

'i  r 

2 -1_ 

(e) 

'20  7] 

14  6_ 

(f) 

"39  13" 

26  13_ 

20.  Repeat  Exercise  1 8 for  the  matrix 


21.  Repeat  Exercise  18  for  the  matrix 


Answer: 

(a)  27  0 0 

0 26  -18 
0 18  26 

<b)  ^7  0 

0 0.026  0.018 

0 -0.018  0.026 

(c)  [4  0 O' 

0 -5  -12 
0 12  -5 

(d)  [ 1 o 0" 

0-3  3 

0 -3  -3 

(e)  16  0 0 

0 -14  -15 
0 15  -14 


(f) 


25  0 0 

0 32  -24 
0 24  32 

In  Exercises  22-24,  let  pi(x)  = x2  — 9,  P2(x)  =x  + 3>  and  P2(x)  =x  — 3.  Show  that 
p\  (A)  = P2(A)P2(A)  for  the  given  matrix. 


22.  The  matrix^  in  Exercise  18. 

23.  The  matrix  A in  Exercise  21 . 

24.  An  arbitrary  square  matrix  A. 

25.  Show  that  if  p(x)  = x2  — (a  + d)x  + (ad  — be)  and 

'a  b 


A = 


c d 


then  p (A)  = 0. 

26.  Show  that  if  p(x)  = ar  — (a  + b + c)x*  + (ab  + ae  4=  be  — cd)x  —a(be  — cd ) and 

a 0 0 
0 b c 


A = 


Ode 


then  p(A)  = 0. 

27.  Consider  the  matrix 


A = 


an  0 

0 a22 
0 0 


0 

0 


a 


yin 


where  ana 22 


ann  * 0.  Show  that  A is  invertible  and  find  its  inverse. 


Answer: 

0 • • • 0 

<311 

0 — • • • 0 

a22 

0 0 . . . -i- 

ann 

28.  Show  that  if  a square  matrix  A satisfies  A 2 — 3 A I 1=0 , then  A = 31  — A- 

(a)  Show  that  a matrix  with  a row  of  zeros  cannot  have  an  inverse. 

(b)  Show  that  a matrix  with  a column  of  zeros  cannot  have  an  inverse. 

30.  Assuming  that  all  matrices  are  ^ x n and  invertible,  solve  for  D. 

ABC  T DBA  tC  = ABt 


Answer: 

B~l 

34.  Simplify: 

(AC  _1  j _1  (AC  _1  J [AC  _1  ylAD~l 

In  Exercises  35-37,  determine  whether  ,4  is  invertible,  and  if  so,  find  the  inverse.  [Hint:  Solve  AX  = / for  X 
by  equating  corresponding  entries  on  the  two  sides.] 

35.  ri  o r 

A=  110 
0 1 1 

Answer: 

1 1 _1 

2 2 2 

A~l=  -111 

A 2 2 2 

1_1  1 

2 2 2 

36.  r i i r 

A=  10  0 
0 1 1 

37.  0 0 1" 

A=  110 

-1  1 1 


Answer: 


1 i _r 

2 2 2 

A~l  = _!  1 1 

2 2 2 

1 0 0 

38.  Prove  Theorem  1.4.2. 

In  Exercises  39-42,  use  the  method  of  Example  8 to  find  the  unique  solution  of  the  given  linear  system. 

39.  3xi  -2x2=  - 1 

4xi  + 5x2  = 3 

Answer: 

r -J_  r -11 

1_23’  2_23 

40.  -xi  + 5x2  = 4 
-xi  -3x2=  1 

41.  6xi  + *2  = 0 

4xi  “ 3x2  = — 2 

Answer: 

r - _ J_  T - A. 

1 ir  2 ii 

42.  2xi  -2X2=4 

*1  +4*2  = 4 

43.  Prove  part  (a)  of  Theorem  1.4.1. 

44.  Prove  part  (c)  of  Theorem  1.4.1. 

45.  Prove  part  (f)  of  Theorem  1.4.1. 

46.  Prove  part  (b)  of  Theorem  1.4.2. 

47.  Prove  part  (c)  of  Theorem  1.4.2. 

48.  Verify  Formula  4 in  the  text  by  a direct  calculation. 

49.  Prove  part  (d)  of  Theorem  1.4.8. 

50.  Prove  part  (e)  of  Theorem  1.4.8. 

(a)  Show  that  if  A is  invertible  and  then  B = C- 

(b)  Explain  why  part  (a)  and  Example  3 do  not  contradict  one  another. 

52.  Show  that  if  A is  invertible  and  k is  any  nonzero  scalar,  then  (kA) }l  = for  all  integer  values  of  n. 

(a)  Show  that  if  A,  B , and  A \ B are  invertible  matrices  with  the  same  size,  then 

a(a~ 1 + B~l  'jBiA  + B)~l=I 

(b)  What  does  the  result  in  part  (a)  tell  you  about  the  matrix  A I B ? 


54.  A square  matrix  A is  said  to  be  idempotent  if  A1  = A- 

(a)  Show  that  if  A is  idempotent,  then  so  is  / _ A- 

(b)  Show  that  if  A is  idempotent,  then  2A  — I is  invertible  and  is  its  own  inverse. 

55.  Show  that  if  A is  a square  matrix  such  that  Ak  = 0 f°r  some  positive  integer  k,  then  the  matrix  A is 
invertible  and 

(1  -A)~l  =!  + A + A2+  • • • +Ak~l 

True-False  Exercises 

In  parts  (a)-(k)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  Two  nxn  matrices,  A and  B,  are  inverses  of  one  another  if  and  only  if  AB  = BA  = 0- 
Answer: 

False 

9 9 9 

(b)  For  all  square  matrices  A and  B of  the  same  size,  it  is  true  that  (A  + B)  = A + 2 AB  + B . 

Answer: 

False 

9 9 

(c)  For  all  square  matrices  A and  B of  the  same  size,  it  is  true  that  A — B = (A  — B)  (A  + B) . 

Answer: 

False 

(d)  If  A and  B are  invertible  matrices  of  the  same  size,  then  AB  is  invertible  and  (AB)  = A . 

Answer: 

False 

(e)  If  A and  B are  matrices  such  that  AB  is  defined,  then  it  is  true  that  (AB) 1 = AJ  B1 . 

Answer: 

False 

(I)  The  matrix 

A-\a  b 

[c  d_ 

is  invertible  if  and  only  if  ad  — be  * 0- 

Answer: 


True 


(g)  If  A and  B are  matrices  of  the  same  size  and  k is  a constant,  then  (kA  + B) 1 =kA^  + B* . 


Answer: 

True 

(h)  If  A is  an  invertible  matrix,  then  so  is  -T 
Answer: 

True 

(i) Ifp(x)  =tfo+<3l*  + tf2X2+  • • • -f  amxm  and  / is  an  identity  matrix,  then 

p(I)  =CtQ  I Ct\  I Cl2  ( • • • I ctm. 

Answer: 

False 

a)  a square  matrix  containing  a row  or  column  of  zeros  cannot  be  invertible. 
Answer: 

True 

(k)  The  sum  of  two  invertible  matrices  of  the  same  size  must  be  invertible. 
Answer: 

False 
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1.5  Elementary  Matrices  and  a Method  for  Finding 


In  this  section  we  will  develop  an  algorithm  for  finding  the  inverse  of  a matrix,  and  we  will  discuss  some  of  the 
basic  properties  of  invertible  matrices. 

In  Section  1 . 1 we  defined  three  elementary  row  operations  on  a matrix  A: 

Multiply  a row  by  a nonzero  constant  c. 

Interchange  two  rows. 

Add  a constant  c times  one  row  to  another. 

It  should  be  evident  that  if  we  let  B be  the  matrix  that  results  from  A by  performing  one  of  the  operations  in  this 
list,  then  the  matrix  A can  be  recovered  from  B by  performing  the  corresponding  operation  in  the  following  list: 

Multiply  the  same  row  by  He. 

Interchange  the  same  two  rows. 

If  B resulted  by  adding  c times  row  r\  of  A to  row  r2,  then  add  —c  times  r\  to  r2- 

It  follows  that  if  B is  obtained  from  A by  performing  a sequence  of  elementary  row  operations,  then  there  is  a 
second  sequence  of  elementary  row  operations,  which  when  applied  to  B recovers  A (Exercise  43).  Accordingly, 
we  make  the  following  definition. 


DEFINITION  1 

Matrices  A and  B are  said  to  be  row  equivalent  if  either  (hence  each)  can  be  obtained  from  the  other  by 
a sequence  of  elementary  row  operations. 


J 


Our  next  goal  is  to  show  how  matrix  multiplication  can  be  used  to  carry  out  an  elementary  row  operation. 

r n 


DEFINITION  2 

An  yi  x n matrix  is  called  an  elementary  matrix  if  it  can  be  obtained  from  the  ^ x n identity  matrix  In 
by  performing  a single  elementary  row  operation. 


J 


EXAMPLE  1 Elementary  Matrices  and  Row  Operations 


Listed  below  are  four  elementary  matrices  and  the  operations  that  produce  them. 


1 0 
_°  -3 

T 

Multiply  the 
second  row  of 
h by  - 3. 


10  0 0 
0 0 0 1 
0 0 10 
0 10  0 

t 

Interchange  the 

second  and  fourth 
rows  of  74 . 


1 0 3 
0 1 0 
0 0 1 

t 

Add  3 times  the 
third  row  of 
1 2 to  the  first  row. 


1 0 0 
0 1 0 
0 0 1 

T 

Multiply  the 

first  row  of 
by  1 . 


The  following  theorem,  whose  proof  is  left  as  an  exercises,  shows  that  when  a matrix  A is  multiplied  on  the  left 
by  an  elementary  matrix  E,  the  effect  is  to  perform  an  elementary  row  operation  on  A. 


Row  Operations  by  Matrix  Multiplication 

If  the  elementary  matrix  E results  from  performing  a certain  row  operation  on  Im  and  il'.l  is  an  x n 
matrix,  then  the  product  EA  is  the  matrix  that  results  when  this  same  row  operation  is  performed  on  A. 


EXAMPLE  2 Using  Elementary  Matrices 


Consider  the  matrix 


0 2 
-1  3 
4 4 


3 

6 

0 


and  consider  the  elementary  matrix 


0 0 
1 0 
0 1 


which  results  from  adding  3 times  the  first  row  of  73  to  the  third  row.  The  product  EA  is 


EA  = 


1 

2 

4 


0 

-1 

4 


2 3 

3 6 
10  9 


which  is  precisely  the  same  matrix  that  results  when  we  add  3 times  the  first  row  of  A to  the  third 
row. 


Theorem  1.5.1  will  be  a useful  tool  for 
developing  new  results  about  matrices, 
but  as  a practical  matter  it  is  usually 
preferable  to  perform  row  operations 
directly. 


We  know  from  the  discussion  at  the  beginning  of  this  section  that  if  E is  an  elementary  matrix  that  results  from 
performing  an  elementary  row  operation  on  an  identity  matrix  /,  then  there  is  a second  elementary  row 
operation,  which  when  applied  to  E , produces  / back  again.  Table  1 lists  these  operations.  The  operations  on  the 
right  side  of  the  table  are  called  the  inverse  operations  of  the  corresponding  operations  on  the  left. 


Table  1 


Row  Operation  on  I That  Produces  E 

Row  Operation  on  E That  Reproduces  I 

Multiply  row  i by  c 0 

Multiply  row  i by  l/c 

Interchange  rows  i and  j 

Interchange  rows  i and  j 

Add  c times  row  i to  row  j 

Add  —c  times  row  i to  row  j 

EXAMPLE  3 Row  Operations  and  Inverse  Row  Operations 

In  each  of  the  following,  an  elementary  row  operation  is  applied  to  the  2 x 2 identity  matrix  to 
obtain  an  elementary  matrix  E , then  E is  restored  to  the  identity  matrix  by  applying  the  inverse  row 
operation. 


'i 

o' 

'1 

O' 

'1 

O' 

_0 

1_ 

t 

_0 

7_ 

t 

_0 

1_ 

Multiply  the  second 

Multiply  the  second 

row  by  7. 

row  by  Jj-  . 

'1 

o' 

'0 

f 

'1 

O' 

_0 

1 

t 

1 

0_ 

t 

_0 

1 

Interchange  the  first 

Interchange  the  first 

and  second  rows. 

and  second  rows. 

'1 

o' 

'1 

5' 

'1 

O' 

_0 

1_ 

t 

_0 

1_ 

t 

_0 

1_ 

Add  5 times  the 

Add  —5  times  the 

second  row  to  the 

second  row  to  the 

first. 

first. 

The  next  theorem  is  a key  result  about  invertibility  of  elementary  matrices.  It  will  be  a building  block  for  many 
results  that  follow. 


THEOREM  1.5.2 


Every  elementary  matrix  is  invertible,  and  the  inverse  is  also  an  elementary  matrix. 


If  E is  an  elementary  matrix,  then  E results  by  performing  some  row  operation  on  /.  Let  Eq  be  the 
matrix  that  results  when  the  inverse  of  this  operation  is  performed  on  I.  Applying  Theorem  1.5.1  and  using  the 
fact  that  inverse  row  operations  cancel  the  effect  of  each  other,  it  follows  that 

EqE  = 1 and  EEq  = I 

Thus,  the  elementary  matrix  Eq  is  the  inverse  of  E. 


Equivalence  Theorem 

One  of  our  objectives  as  we  progress  through  this  text  is  to  show  how  seemingly  diverse  ideas  in  linear  algebra 
are  related.  The  following  theorem,  which  relates  results  we  have  obtained  about  invertibility  of  matrices, 
homogeneous  linear  systems,  reduced  row  echelon  forms,  and  elementary  matrices,  is  our  first  step  in  that 
direction.  As  we  study  new  topics,  more  statements  will  be  added  to  this  theorem. 


Equivalent  Statements 

If  A is  an  ^ x n matrix,  then  the  following  statements  are  equivalent,  that  is,  all  true  or  all  false. 

(a)  A is  invertible. 

(b)  Ax  = 0 has  only  the  trivial  solution. 

(c)  The  reduced  row  echelon  form  of  A is  ln. 

(d)  A is  expressible  as  a product  of  elementary  matrices. 


It  may  make  the  logic  of  our  proof  of  Theorem 
1.5.3  more  apparent  by  writing  the  implications 

(a)  =>  (A)  =►  (0  =>  (d)  =>  (a) 


This  makes  it  evident  visually  that  the  validity 


of  any  one  statement  implies  the  validity  of  all 
the  others,  and  hence  that  the  falsity  of  any  one 
implies  the  falsity  of  the  others. 


We  will  prove  the  equivalence  by  establishing  the  chain  of  implications: 

(a)  =>  (b)  =*  (c)  =>  (d)  =>  (fl) 

(a)  -»  (*)  Assume  A is  invertible  and  let  xq  be  any  solution  of.  Multiplying  both  sides  of  this  equation  by  the 
matrix  A gives  ^_1(^x0)=^_10’or  L)x0  = 0,  or  ZxQ  = 0,  or  XQ  = 0.  Thus,  Ax  = 0 has  only  the 
trivial  solution. 

Let  Ax  = 0 be  the  matrix  form  of  the  system 

*11*1  + <*12*2+  — + <21m*m  = 0 

■321*1  + «22*2  +...  + a2»xn  = 0 ^ 

■3m  1*1  “H"  <3  m2*  2 = 0 

and  assume  that  the  system  has  only  the  trivial  solution.  If  we  solve  by  Gauss-Jordan  elimination,  then  the 
system  of  equations  corresponding  to  the  reduced  row  echelon  form  of  the  augmented  matrix  will  be 

xi  =0 


Thus  the  augmented  matrix 

*11 

*21 

an\ 

for  1 can  be  reduced  to  the  augmented  matrix 


for  2 by  a sequence  of  elementary  row  operations.  If  we  disregard  the  last  column  (all  zeros)  in  each  of  these 
matrices,  we  can  conclude  that  the  reduced  row  echelon  form  of  A is  In. 

(O  =*  (<o  Assume  that  the  reduced  row  echelon  form  of  A is  /„,  so  that  A can  be  reduced  to  ln  by  a finite 
sequence  of  elementary  row  operations.  By  Theorem  1.5.1,  each  of  these  operations  can  be  accomplished  by 
multiplying  on  the  left  by  an  appropriate  elementary  matrix.  Thus  we  can  find  elementary  matrices 
E\.  i?2>  --->  Ek  such  that 


*2 


= 0 


*M  — 0 


■312 

<322 


<*\n  0 
■32m  0 


<3  m2 


0 


1 0 0 
0 1 0 
0 0 1 


0 0 
0 0 
0 0 


0 0 0 


1 0 


(2) 


Efc  • • • E^E\A  — In 


(3) 


By  Theorem  1.5.2,  E\,  5*2, E ^ are  invertible.  Multiplying  both  sides  of  Equation  3 on  the  left  successively 
by  E^{ , Sf1  we  obtain 

A = B f1  • • • /„  = 2f‘  S2-‘  • • • (4) 

By  Theorem  1.5.2,  this  equation  expresses  A as  a product  of  elementary  matrices. 

(<0  =»  (a)  If  ^4  is  a product  of  elementary  matrices,  then  from  Theorem  1.4.7  and  Theorem  1.5.2,  the  matrix^ 
is  a product  of  invertible  matrices  and  hence  is  invertible. 


A Method  for  Inverting  Matrices 

As  a first  application  of  Theorem  1.5.3,  we  will  develop  a procedure  (or  algorithm)  that  can  be  used  to  tell 
whether  a given  matrix  is  invertible,  and  if  so,  produce  its  inverse.  To  derive  this  algorithm,  assume  for  the 
moment,  that  A is  an  invertible  ^ x n matrix.  In  Equation  3,  the  elementary  matrices  execute  a sequence  of  row 
operations  that  reduce  A to  ln.  If  we  multiply  both  sides  of  this  equation  on  the  right  by  A~^  and  simplify,  we 
obtain 

A =Efcm  • • E2E\In 

But  this  equation  tells  us  that  the  same  sequence  of  row  operations  that  reduces  A to  ln  will  transform  ln  to  A ~ ^ 
. Thus,  we  have  established  the  following  result. 


Inversion  Algorithm 

To  find  the  inverse  of  an  invertible  matrix  A , find  a sequence  of  elementary  row  operations  that  reduces 
A to  the  identity  and  then  perform  that  same  sequence  of  operations  on  ln  to  obtain  A • 


A simple  method  for  carrying  out  this  procedure  is  given  in  the  following  example. 


EXAMPLE  4 Using  Row  Operations  to  Find  A 1 


Find  the  inverse  of 


2 3 
5 3 
0 8 


We  want  to  reduce  A to  the  identity  matrix  by  row  operations  and  simultaneously 
apply  these  operations  to  / to  produce  A • To  accomplish  this  we  will  adjoin  the  identity  matrix 
to  the  right  side  of  A , thereby  producing  a partitioned  matrix  of  the  form 

[A\  1] 

Then  we  will  apply  row  operations  to  this  matrix  until  the  left  side  is  reduced  to  /;  these 
operations  will  convert  the  right  side  to  A , so  the  final  matrix  will  have  the  form 


The  computations  are  as  follows: 


1 

2 

1 

'l  2 
0 1 
0 -2 

'l  2 
0 1 
0 0 

1 2 3 

0 1 -3 
0 0 1 

'12  0 
0 1 0 
0 0 1 

‘l  0 0 
0 1 0 
0 0 1 


2 

5 

0 

3 

-3 

5 

3 

-3 

-1 


3 

1 

0 

0 

3 

0 

1 

0 

8 

0 

0 

1 

> 

1 

0 

0 

j 

-2 

1 

0 

-1 

0 

1 

j 

1 

0 

0 

> 

-2 

1 

0 

-5 

2 

1 

1 

0 

0 

•2 

1 

0 

5 

-2 

-1 

-14 

6 

3 

13 

-5 

-3 

5 

-2 

-1 

-40 

16 

9 

13 

-5 

-3 

5 

-2 

-1 

We  added  —2  times  the  first 
row  to  the  second  and  —1  times 
the  first  row  to  the  third. 

We  added  2 times  the 
second  row  to  the  third. 


We  multiplied  the  third 
row  by— 1. 

We  added  3 times  the  third 
row  to  the  second  and  —3  times 
the  third  row  to  the  first. 

We  added  —2  times  the 
second  row  to  the  first. 


Thus, 


iT1 


-40  16  9 

13  -5  -3 
5 -2  -1 


Often  it  will  not  be  known  in  advance  if  a given  n x n matrix  A is  invertible.  However,  if  it  is  not,  then  by  parts 
(a)  and  (c)  of  Theorem  1.5.3  it  will  be  impossible  to  reduce  A to  ln  by  elementary  row  operations.  This  will  be 
signaled  by  a row  of  zeros  appearing  on  the  left  side  of  the  partition  at  some  stage  of  the  inversion  algorithm.  If 
this  occurs,  then  you  can  stop  the  computations  and  conclude  that  A is  not  invertible. 

EXAMPLE  5 Showing  That  a Matrix  Is  Not  Invertible 


Consider  the  matrix 


A = 


1 6 4 

2 4-1 

-12  5 


Applying  the  procedure  of  Example  4 yields 


1 6 4 

2 4-1 

-12  5 

1 6 4 

0 -8  -9 

0 8 9 

1 6 4 

0 -8  -9 
0 0 0 


1 0 0 

0 1 0 

0 0 1 

1 0 0 

-2  1 0 

1 0 1 

1 0 0 

-2  1 0 

-1  1 1 


We  added  —2  times  the  first 
row  to  the  second  and  added 
the  first  row  to  the  third. 

We  added  the 
second  row  to 
the  third. 


Since  we  have  obtained  a row  of  zeros  on  the  left  side,  A is  not  invertible. 


EXAMPLE  6 Analyzing  Homogeneous  Systems 

Use  Theorem  1.5.3  to  determine  whether  the  given  homogeneous  system  has  nontrivial  solutions. 

(a)  xi + 2^2 + 3^3  = 0 

2xi  + 5x2  + 3x3  = 0 
xi  +8x3  = 0 

(b)  xi + 6x2 + 4x3  = 0 

2xi  + 4x2  — X3  = 0 

— xi  + 2x2  + 5x3  = 0 

From  parts  (a)  and  ( b ) of  Theorem  1.5.3  a homogeneous  linear  system  has  only  the 
trivial  solution  if  and  only  if  its  coefficient  matrix  is  invertible.  From  Example  4 and  Example  5 
the  coefficient  matrix  of  system  (a)  is  invertible  and  that  of  system  (b)  is  not.  Thus,  system  (a)  has 
only  the  trivial  solution  whereas  system  (b)  has  nontrivial  solutions. 


Concept  Review 

Row  equivalent  matrices 
Elementary  matrix 
Inverse  operations 
Inversion  algorithm 

Skills 

Determine  whether  a given  square  matrix  is  an  elementary. 

Determine  whether  two  square  matrices  are  row  equivalent. 

Apply  the  inverse  of  a given  elementary  rwo  operation  to  a matrix. 

Apply  elementary  row  operations  to  reduce  a given  square  matrix  to  the  identity  matrix. 


Understand  the  relationships  between  statements  that  are  equivalent  to  the  invertibility  of  a square 
matrix  (Theorem  1.5.3). 

Use  the  inversion  algorithm  to  find  the  inverse  of  an  invertible  matrix. 

Express  an  invertible  matrix  as  a product  of  elementary  matrices. 


Exercise  Set  1 .5 


1.  Decide  whether  each  matrix  below  is  an  elementary  matrix. 


(a) 


1 0 
-5  1 


(b) 


-5  1 
1 0 


(c) 


1 1 0 
0 0 1 
0 0 0 


(d) 


2 0 
0 1 
0 0 
0 0 


0 2 
0 0 
1 0 
0 1 


Answer: 


(a)  Elementary 

(b)  Not  elementary 

(c)  Not  elementary 

(d)  Not  elementary 


2.  Decide  whether  each  matrix  below  is  an  elementary  matrix. 


(a) 


1 0 
0 {3 


(b) 


0 0 1 
0 1 0 
1 0 0 


(c) 


1 0 0 
0 1 9 
0 0 1 


(d) 


-10  0 
0 0 1 

0 1 0 


3.  Find  a row  operation  and  the  corresponding  elementry  matrix  that  will  restore  the  given  elementary  matrix  to 


the  identity  matrix. 

(a) 

'1  - 

■3 

_0 

1_ 

(b) 

'-7 

0 

0" 

0 

1 

0 

0 

0 

1 

(c) 

1 

0 

0" 

0 

1 

0 

-5 

0 

1 

(d) 

'0  0 

1 

0 

1 

0 1 

0 

0 

1 0 

0 

0 

0 0 

0 

1 

Answer: 


(a) 


Add  3 times  row  2 to  row  1 : 


1 3 
0 1 


(b) 


1. 


Multiply  row  1 by  — ± 


(C) 


Add  5 times  row  1 to  row  3: 


-J  0 0 

0 1 0 
0 0 1 

1 0 0 
0 1 0 
5 0 1 


(d) 


Swap  rows  1 and  3 : 


0 0 10 
0 10  0 
10  0 0 
0 0 0 1 


4.  Find  a row  operation  and  the  corresponding  elementry  matrix  that  will  restore  the  given  elementary  matrix  to 
the  identity  matrix. 


(a) 


(b) 


(c) 


1 0 
-3  1 

1 0 0 


0 1 
0 0 


0 0 0 1 
0 10  0 
0 0 10 
10  0 0 


(d) 


1 

0 

1 

0 

7 

0 

1 

0 

0 

0 

0 

1 

0 

0 

0 

0 

1 

5.  In  each  part,  an  elementary  matrix  E and  a matrix  A are  given.  Write  down  the  row  operation  corresponding 
to  E and  show  that  the  product  EA  results  from  applying  the  row  operation  to  A. 


(a) 


E = 


(b) 


E = 


1 
0 

0 

1 

-3 


"-1 

-2  5 -1' 

3 

1 

1 

1 

'2-1  0 - 

^r 

i 

^r 

i 

y 

A = 

1 -3  -1 

5 3 

2 0 1 

3 -1 

(c) 

1 0 4' 

'1  4' 

E = 

0 1 0 

, A = 

2 5 

0 0 1 

3 6 

Answer: 


(a) 


Swap  rows  1 and  2:  EA  = 


(b) 


Add  _3  times  row  2 to  row  3:  EA  = 


(c) 


Add  4 times  row  3 to  row  1 : EA  = 


3 _6  -6  -6 

-1  -2  5 -1_ 

2-1  0 
1 -3  -1 
-19  4 

13  28' 

2 5 

3 6 


-4  -4 

5 3 

-12  -10 


6.  In  each  part,  an  elementary  matrix  E and  a matrix  A are  given.  Write  down  the  row  operation  corresponding 
to  E and  show  that  the  product  EA  results  from  applying  the  row  operation  to  A. 


(a) 


E = 


-6 

0 


A = 


-1  -2 
3 -6 


5 -1 
-6  -6 


(b) 

1 0 

o' 

'2 

1 

1 

o 

7 

E = 

-4  1 

0 

, A = 

= 

1 

-3-1  5 3 

O 

o 

1 

1 

2 

0 13-1 

(c) 

1 0 o' 

"1 

4' 

E = 

0 5 0 

y 

A = 

2 

: 5 

i 

o 

o 

1 

3 

: 6 

In  Exercises  7-8,  use  the  following  matrices. 


, B = 


A = 


3 

2 

8 


4 1 

-7  -1 
1 5 


8 1 5 

2 -7  -1 

3 4 1 


C = 


3 

2 

2 


8 1 5 

-6  21  3 
3 4 1 


F = 


8 

8 

3 


1 5 
1 1 
4 1 


7.  Find  an  elementary  matrix  E that  satisfies  the  equation. 

(a)  EA  = B 

(b)  EB  = A 

(c)  EA  = C 

(d)  ec  = a 

Answer: 


(a) 

0 

0 

1 

0 

1 

0 

1 

0 

0 

(b) 

"0 

0 

r 

0 

1 

0 

1 

0 

0 

(c) 

1 

0 0 

0 

1 0 

— 

2 

0 1 

(d) 

'1 

0 

0' 

0 

1 

0 

2 

0 

1 

8.  Find  an  elementary  matrix  E that  satisfies  the  equation. 

(a)  EB  = D 

(b)  ED  — B 
(C)  EB  = F 
(d)  EF  = B 

In  Exercises  9-24,  use  the  inversion  algorithm  to  find  the  inverse  of  the  given  matrix,  if  the  inverse  exists. 


9. 


1 4 

2 7 


Answer: 


10. 


11. 


4 

-1 


6 

5 


3 

-2 


Answer: 


l 

o|isj 

3 1 
7 7 

12. 

6 - 

-4' 

-3 

2_ 

13. 

'3  4 

-1 

1 0 

3 

2 5 

-4 

Answer: 


1 _n_  _6 

2 10  5 

-1  1 1 

_1  J_  2 
"2  10  5 


14. 


1 2 0 
2 1 2 
0 2 1 


2 4 1 

-4  2 -9 


Answer: 


No  inverse 


16. 


1 

5 

1 

5 

1 

5 


1 _2 

5 5 

1 J_ 
5 10 

4 2_ 

5 10 


17. 


1 0 1 
0 1 1 
1 1 0 


Answer: 


1 1 I 

2 2 2 

_1  1 1 

2 2 2 

1 I _i 

2 2 2 

18-  /2  3/2  0 

-4/2  /2  0 

0 0 1 

19.  [2  6 6" 

2 7 6 
2 7 7 

Answer: 


1 ° -3" 

-1  1 0 

0 -1  1 

20.  [1  0 0 0" 

13  0 0 
13  5 0 
13  5 7 

21.  [2  -4  0 O' 

1 2 12  0 

0 0 2 0 

0 -1  -4  -5 

Answer: 

1 1 -3 

4 2 

_1  I _3 

8 4 2 

0 0 i 

40  20  10 

-8  17  2 ^ 

4 0 | -9 

0 0 0 0 

-1  13  4 2 


23.  [-1  0 1 O' 

2 3-2  6 

0-1  2 0 

0 0 15 

Answer: 

~ _1_  JL  5 _l" 

12  24  8 4 

5 _5_  1 _I 

6 12  4 2 

_5_  _5_  5 _I 

12  24  8 4 

L _J_  _i  I 

12  24  8 4 

24.  [0  0 2 O' 

10  0 1 
0-13  0 

2 15-3 

In  Exercises  25  26.  11  nd  the  inverse  of  each  of  the  follow  ing  4x4  matrices,  where  k\,  kj,  *3,  *4.  and  k are 
all  nonzero. 

25-(a)  [*1  0 0 O' 

0 *2  0 0 

0 0 *3  0 

0 0 0 £4 

(b)  \k  1 O' 

0 10  0 
0 0*1 
0 0 0 1 

Answer: 

(a)  [j_  0 0 0 

*1 

0 7L  0 0 

h 

0 0 -ji-  0 

*3 

0 0 0 i 

fc  4 


26-(a)  0 0 0 *1" 

0 0 *2  0 

0 *3  0 0 

£4  0 0 0 

(b)  I” jt  0 0' 

1 * 0 0 

0 1 £ 0 

0 0 1 k 

In  Exercise  27-Exercise  28,  find  all  values  of  c,  if  any,  for  which  the  given  matrix  is  invertible. 

27.  \c  c c~ 

1 c c 


Answer: 

c*  0,1 

28.  I" c 1 0" 

1 c 1 
0 1c 

In  Exercises  29-32,  write  the  given  matrix  as  a product  of  elementary  matrices. 


30.  I"  1 0" 

-5  2_ 

31.  f 1 0 -2' 

0 4 3 

0 0 1 


Answer: 


32. 


1 

0 

-2' 

'1 

0 

-2' 

'1 

0 

o' 

'1 

0 

O' 

0 

4 

3 

= 

0 

1 

0 

0 

1 

3 

0 

4 

0 

0 

0 

1 

0 

0 

1 

0 

0 

1 

0 

0 

1 

1 1 0 
1 1 1 
0 1 1 

In  Exercises  33-36,  write  the  inverse  of  the  given  matrix  as  a product  of  elementary  matrices. 

33.  The  matrix  in  Exercise  29. 

Answer: 


1 O' 

“7  0 

‘l 

-\ 

'1  o' 

-1  1_ 

4 

0 1 

_0 

1_ 

0 1 

34.  The  matrix  in  Exercise  30. 

35.  The  matrix  in  Exercise  3 1 . 


Answer: 


'l  0 2' 

o 1 -1 

4 4 

'l  0 o' 

0 \ 0 
4 

o 

o 

'1  0 2' 

= 

0 1 -3 
0 0 1 

0 1 0 
0 0 1 

O 

O 

i 

o 

o 

1 

36.  The  matrix  in  Exercise  32. 

In  Exercises  37-38,  show  that  the  given  matrices  A and  B are  row  equivalent,  and  find  a sequence  of 
elementary  row  operations  that  produces  B from  A. 


37. 

'1  2 3' 

"1  0 5" 

A = 

1 4 1 

, B = 

0 2-2 

1 

o\ 

Csl 

l 

1 1 4 

Answer: 


Add  — 1 times  the  first  row  to  the  second  row.  Add  — ] times  the  first  row  to  the  third  row.  Add  _ ] times 
the  second  row  to  the  first  row.  Add  the  second  row  to  the  third  row. 


38. 

2 

1 

O' 

6 

9 

4' 

A = 

-1 

1 

0 

, B = 

-5 

-1 

0 

3 

0 

-1 

-1 

-2 

-1 

39.  Show  that  if 


A = 


1 0 
0 1 
a b 


0 

0 

c 


is  an  elementary  matrix,  then  at  least  one  entry  in  the  third  row  must  be  a zero. 


40.  Show  that 


A = 


0 

b 

0 

0 

0 


a 

0 

d 

0 

0 


0 0 0 
c 0 0 
0 e 0 

/ 0 g 

0^0 


is  not  invertible  for  any  values  of  the  entries. 


41.  Prove  that  if  A and  B are  ^ x n matrices,  then  A and  B are  row  equivalent  if  and  only  if  A and  B have  the 
same  reduced  row  echelon  form. 


42.  Prove  that  if  A is  an  invertible  matrix  and  B is  row  equivalent  to  A,  then  B is  also  invertible. 

43.  Show  that  if  B is  obtained  from  A by  performing  a sequence  of  elementary  row  operations,  then  there  is  a 
second  sequence  of  elementary  row  operations,  which  when  applied  to  B recovers  A. 


True-False  Exercises 


In  parts  (a)-(g)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  The  product  of  two  elementary  matrices  of  the  same  size  must  be  an  elementary  matrix. 

Answer: 

False 

(b)  Every  elementary  matrix  is  invertible. 

Answer: 

True 

(c)  If  A and  B are  row  equivalent,  and  if  B and  C are  row  equivalent,  then  A and  C are  row  equivalent. 

Answer: 

True 

(d)  If  A is  an  n x n matrix  that  is  not  invertible,  then  the  linear  system  Ax  = 0 has  infinitely  many  solutions. 
Answer: 

True 

(e)  If  A is  an  n x n matrix  that  is  not  invertible,  then  the  matrix  obtained  by  interchanging  two  rows  of  A cannot 
be  invertible. 

Answer: 


True 


(f)  If  A is  invertible  and  a multiple  of  the  first  row  of  A is  added  to  the  second  row,  then  the  resulting  matrix 
invertible. 

Answer: 

True 

(g)  An  expression  of  the  invertible  matrix  A as  a product  of  elementary  matrices  is  unique. 

Answer: 

False 
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1.6  More  on  Linear  Systems  and  Invertible  Matrics 

In  this  section  we  will  show  how  the  inverse  of  a matrix  can  be  used  to  solve  a linear  system  and  we  will  develop  some  more  results  about 
invertible  matrices. 


Number  of  Solutions  of  a Linear  System 

In  Section  1.1  we  made  the  statement  (based  on  Figures  1.1.1  and  1.1.2)  that  every  linear  system  has  either  no  solutions,  has  exactly  one  solution, 
or  has  infinitely  many  solutions.  We  are  now  in  a position  to  prove  this  fundamental  result. 


THEOREM  1.6.1 

A system  of  linear  equations  has  zero,  one,  or  infinitely  many  solutions.  There  are  no  other  possibilities. 


If  Ax  = b is  a system  of  linear  equations,  exactly  one  of  the  following  is  true:  (a)  the  system  has  no  solutions,  (b)  the  system  has  exactly 
one  solution,  or  (c)  the  system  has  more  than  one  solution.  The  proof  will  be  complete  if  we  can  show  that  the  system  has  infinitely  many  solutions 
in  case  (c). 


Assume  that  = b has  more  than  one  solution,  and  let  xq  = x\  — X2,  where  xi  and  X2  are  any  two  distinct  solutions.  Because  xi  and  X2  are 
distinct,  the  matrix  xo  is  nonzero;  moreover, 

Axq  = A(x\  — X2)  = Ax.  i — Ax  2 = b — b = 0 


If  we  now  let  k be  any  scalar,  then 


A(x\  + &xo)  = ^*1  + -d(Axo)  = Ax i 4-  £(^4xo) 
= b + £0  = b+  0 = b 


But  this  says  that  x\  + &xq  is  a solution  of  Ax  = b-  Since  xo  is  nonzero  and  there  are  infinitely  many  choices  for  k , the  system  Ax  = b has 
infinitely  many  solutions. 


Solving  Linear  Systems  by  Matrix  Inversion 

Thus  far  we  have  studied  two  procedures  for  solving  linear  systems-Gauss-Jordan  elimination  and  Gaussian  elimination.  The  following  theorem 
provides  an  actual  formula  for  the  solution  of  a linear  system  of  n equations  in  n unknowns  in  the  case  where  the  coefficient  matrix  is  invertible. 


THEOREM  1.6.2 

If  A is  an  invertible  ^ x n matrix,  then  for  each  « x 1 matrix  b,  the  system  of  equations  Ax  = b has  exactly  one  solution,  namely,  x = A -1b 


Since  A ^4  *b  J = b?  it  follows  that  x = A *b  is  a solution  of  Ax  = b-  To  show  that  this  is  the  only  solution,  we  will  assume  that  xo  is  an 
arbitrary  solution  and  then  show  that  xo  must  be  the  solution  ^_1b- 

If  xo  is  any  solution  of  ^x  = b>  then  -<4x0  = b.  Multiplying  both  sides  of  this  equation  by  A , we  obtain  Xq  = ^4  _1b- 

EXAMPLE  1 Solution  of  a Linear  System  Using  A-1 

Consider  the  system  of  linear  equations 

x\  + 2x2  + 3x3  = 5 
2xi + 5x2 + 3x3=  3 
xi  +8x3  = 17 

In  matrix  form  this  system  can  be  written  as  ^x  = b>  where 


"1  2 3" 

"*f 

" 5" 

2 5 3 

, X = 

*2 

, b = 

3 

i — 

o 

00 

1 

*3 

17 

In  Example  4 of  the  preceding  section,  we  showed  that  A is  invertible  and 

-40  16  9 


a-*- 


By  Theorem  1.6.2,  the  solution  of  the  system  is 


x = A *b  = 


13  “5  -3 
5 -2  =1 


'-40  16  9" 

" 5" 

f 

co 

1 

m 

1 

CO 

3 

= 

-1 

5 —2  -1 

17 

2 

or  x i = 1 , *2  — — 1,  *3  = 2. 


Keep  in  mind  that  the  method  of  Example  1 only  applies  when  the 
system  has  as  many  equations  as  unknowns  and  the  coefficient 
matrix  is  invertible. 


Linear  Systems  with  a Common  Coefficient  Matrix 

Frequently,  one  is  concerned  with  solving  a sequence  of  systems 

Ax  = h\,  Ax  = b2.  Ax  = b3, Jx  = b* 
each  of  which  has  the  same  square  coefficient  matrix  + If  A is  invertible,  then  the  solutions 

x\=A~lh\,  X2  = A~lh2,  X3=^_1b3, x*=j4-1bfc 

can  be  obtained  with  one  matrix  inversion  and  k matrix  multiplications.  An  efficient  way  to  do  this  is  to  form  the  partitioned  matrix 

[ii|bi|b2|-  • -|b*|]  (l) 

in  which  the  coefficient  matrix  A is  “augmented”  by  all  k of  the  matrices  bi,  b2,. . .,b&  and  then  reduce  1 to  reduced  row  echelon  form  by  Gauss- 
Jordan  elimination.  In  this  way  we  can  solve  all  k systems  at  once.  This  method  has  the  added  advantage  that  it  applies  even  when  A is  not 
invertible. 

EXAMPLE  2 Solving  Two  Linear  Systems  at  Once 


Solve  the  systems 

(a) 

XI  + 

2x2  + 

3x3  = 4 

2xi  + 

5x2  + 

3x3  = 5 

*1 

+ 

8x3  = 9 

(b) 

XI  + 

2x2  + 

3x3  = 

1 

2xi  + 

5x2  + 

3x3  = 

6 

*1 

+ 

8x3  = - 

•6 

The  two  systems  have  the  same  coefficient  matrix.  If  we  augment  this  coefficient  matrix  with  the  columns  of  constants  on 
the  right  sides  of  these  systems,  we  obtain 


”12  3 

4 

f 

2 5 3 

5 

6 

1 — 

0 

00 

9 

-6 

Reducing  this  matrix  to  reduced  row  echelon  form  yields  (verify) 


0 

0 

1 

2" 

0 1 0 

0 

1 

1 — 

0 

0 

1 

-1 

It  follows  from  the  last  two  columns  that  the  solution  of  system  (a)  is  xj  = 1,*2  = 0,X3  = 1 and  the  solution  of  system  (b)  is  x\  = 2 
, *2  = 1,  *3  = - I- 


Properties  of  Invertible  Matrices 

Up  to  now,  to  show  that  an  ^ x n matrix  A is  invertible,  it  has  been  necessary  to  find  an  nxn  matrix  B such  that 

AB  = 1 and  BA  = ! 

The  next  theorem  shows  that  if  we  produce  an  ^ x n matrix  B satisfying  either  condition,  then  the  other  condition  holds  automatically. 


THEOREM  1.6.3 

Let  A be  a square  matrix. 

(a)  If  B is  a square  matrix  satisfying  BA  = L then  B = A -1  • 

(b)  If  B is  a square  matrix  satisfying  AB  — /,  then  B = A • 


We  will  prove  part  ( a ) and  leave  part  ( b ) as  an  exercise. 

Assume  that  BA  = /•  If  we  can  show  that  A is  invertible,  the  proof  can  be  completed  by  multiplying  BA  = / on  both  sides  by  A -1  to 

obtain 


BAA~l=IA~l  or  BI  = IA~l  or  B = A~l 

To  show  that  A is  invertible,  it  suffices  to  show  that  the  system  = 0 has  only  the  trivial  solution  (see  Theorem  1.5.3).  Let  xo  be  any  solution  of 
this  system.  If  we  multiply  both  sides  of  Axq  = 0 on  the  left  by  B , we  obtain  BAx q = BO  or  /xq  = 0 or  xq  = 0.  Thus,  the  system  of  equations 
Ax  = 0 has  only  the  trivial  solution. 


Equivalence  Theorem 

We  are  now  in  a position  to  add  two  more  statements  to  the  four  given  in  Theorem  1.5.3. 


Equivalent  Statements 

If  A is  an  n x n matrix,  then  the  following  are  equivalent. 

(a)  A is  invertible. 

(b)  Ax  = 0 has  only  the  trivial  solution. 

(c)  The  reduced  row  echelon  form  of  A is  In. 

(d)  A is  expressible  as  a product  of  elementary  matrices. 

(e)  Ax  = b is  consistent  for  every  « x 1 matrix  b. 

(f)  Ax  = b has  exactly  one  solution  for  every  ^ x 1 matrix  b. 


It  follows  from  the  equivalency  of  parts  (e)  and  if)  that  if  you  can 
show  that  Ax  = b has  at  least  one  solution  for  every  « x 1 matrix 
b,  then  you  can  conclude  that  it  has  exactly  one  solution  for  every 
x 1 matrix  b. 

Since  we  proved  in  Theorem  1.5.3  that  (a),  ( b ),  (c),  and  ( d)  are  equivalent,  it  will  be  sufficient  to  prove  that  (a)  =*  (/)  =£*  (e)  =>  (a). 

This  was  already  proved  in  Theorem  1.6.2. 

(/)=#>  ( e ) This  is  self-evident,  for  if  ^x  = b has  exactly  one  solution  for  every  n x 1 matrix  b,  then  ^x  = b is  consistent  for  every  « x 1 matrix  b. 


« =*  to  If  the  system  Ax  = b is  consistent  for  every  « x 1 matrix  b,  then,  in  particular,  this  is  so  for  the  systems 


Y 

"0" 

"0" 

0 

1 

0 

0 

, Ax  = 

0 

Ax  = 

0 

0 

0 

1 

Let  xi,  X2,. . .,xw  be  solutions  of  the  respective  systems,  and  let  us  form  an  nxn  matrix  C having  these  solutions  as  columns.  Thus  C has  the  form 

C=  [xi|x2|-  • • |x„] 

As  discussed  in  Section  1.3,  the  successive  columns  of  the  product  AC  will  be 

Ax\,  Ax 2,  Axn 


[see  Formula  8 of  Section  1.3].  Thus, 

AC=  [Axi\Ax2\  ■ ■ • |^x„]  = 

By  part  ( b ) of  Theorem  1.6.3,  it  follows  that  C = A~^-  Thus,  A is  invertible. 


1 0 
0 1 
0 0 

0 0 


= / 


We  know  from  earlier  work  that  invertible  matrix  factors  produce  an  invertible  product.  Conversely,  the  following  theorem  It  shows  that  if  the 
product  of  square  matrices  is  invertible,  then  the  factors  themselves  must  be  invertible. 


THEOREM  1.6.5 

Let  A and  B be  square  matrices  of  the  same  size.  If  AB  is  invertible,  then  A and  B must  also  be  invertible. 


In  our  later  work  the  following  fundamental  problem  will  occur  frequently  in  various  contexts. 


A Fundamental  Problem 

Let  A be  a fixed  ^ x n matrix.  Find  all  ^ x 1 matrices  b such  that  the  system  of  equations  Ax.  = b is  consistent. 

J 


If  A is  an  invertible  matrix,  Theorem  1.6.2  completely  solves  this  problem  by  asserting  that  for  every  ^ x 1 matrix  b,  the  linear  system  Ax  = b has 
the  unique  solution  x = A~^h-  If  ^4  is  not  square,  or  if  ^4  is  square  but  not  invertible,  then  Theorem  1.6.2  does  not  apply.  In  these  cases  the  matrix  b 
must  usually  satisfy  certain  conditions  in  order  for  ^x  = b t°  be  consistent.  The  following  example  illustrates  how  the  methods  of  Section  1 .2  can 
be  used  to  determine  such  conditions. 

EXAMPLE  3 Determining  Consistency  by  Elimination 

What  conditions  must  b\,b2,  and  b 3 satisfy  in  order  for  the  system  of  equations 

*1+3:2  + 27:3  = ^1 

*1  +*3  = ^2 

2t:i +3:2  + 33:3  = Z>3 

to  be  consistent? 

Solution  The  augmented  matrix  is 

'l  1 2 

1 0 1 b2 

2 13  b3 


which  can  be  reduced  to  row  echelon  form  as  follows: 


— 1 times  the  first  row  was  added  to  the  second  and  — 2 times  the  first  row  was  added  to  the  third. 


1 1 2 b\ 

0 —1  —1  b2~b\ 

0 -1  -1  b2  — 2b\ 

1 1 2 *i 

0 1 1 

0 -l  _1  b2-2b\ 

112  b\ 

0 1 1 b\-b2 

0 0 0 &3  — b2  — b\ 


The  second  row  was  multiplied  by— 1. 


The  second  row  was  added  to  the  third. 


It  is  now  evident  from  the  third  row  in  the  matrix  that  the  system  has  a solution  if  and  only  iib\,  b2,  and  b2  satisfy  the  condition 

&3_.&2  — &i=0  or  &3  = &i+&2 

To  express  this  condition  another  way,  = b is  consistent  if  and  only  if  b is  a matrix  of  the  form 

b 1 


b = 


b 2 

b\  +&2 


where  b\  and  b2  are  arbitrary. 


EXAMPLE  4 Determining  Consistency  by  Elimination 

What  conditions  must  b\,b2,  and  b2  satisfy  in  order  for  the  system  of  equations 

xi  + 2x2  + 3x3  = ^1 
2xi  + 5x2  + 3x3  = ^2 
xi  H-8x3  = &3 

to  be  consistent? 

Solution  The  augmented  matrix  is 

'12  3 bx~ 

2 5 3 b2 
1 0 8 b3 

Reducing  this  to  reduced  row  echelon  form  yields  (verify) 

'1  0 0 -40£i  + 16£2  + 9Z>3' 

0 1 0 13£i-562-3£3 

0 0 1 5b\  — 2*2“  h 

In  this  case  there  are  no  restrictions  on  b\,  b2 , and  b2,  so  the  system  has  the  unique  solution 

xi  = — 40&i  + I6&2  4-  9Z>3,  X2  = 13^i  — 5Z>2  ” 3Z>3,  X3  = 5b\  — 2b2  — b2 
for  all  values  of  b\,  b2,  and  63. 


(2) 


(3) 


What  does  the  result  in  Example  4 tell  you  about  the  coefficient 
matrix  of  the  system? 


Skills 

Determine  whether  a linear  system  of  equations  has  no  solutions,  exactly  one  solution,  or  infinitely  many  solutions. 
Solve  linear  systems  by  inverting  its  coefficient  matrix. 

Solve  multiple  linear  systems  with  the  same  coefficient  matrix  simultaneously. 


Be  familiar  with  the  additional  conditions  of  invertibility  stated  in  the  Equivalence  Theorem. 


Exercise  Set  1 .6 


In  Exercises  1-8,  solve  the  system  by  inverting  the  coefficient  matrix  and  using  Theorem  1.6.2. 

1.  *1  + *2  = 2 

5x\  + 6x2  = 9 

Answer: 

xi  = 3,  *2=  - 1 

2.4xi  -3x2=  “3 
2xi  “ 5x2  = 9 

3.  xi  + 3x2  + X3  = 4 

2xi  + 2x2  + X3  = —1 

2xi  + 3x2  + X3  = 3 

Answer: 

xi  = - 1,  X2=4,  X3=  -1 

4.  5xi  + 3x2  + 2x3  = 4 

3xi  + 3x2  + 2x3  = 2 

*2  4“  *3  = 5 

5.  *+7  + z = 5 

x +7  -4z  = 10 

— 4x  + y + z = 0 

Answer: 

x = 1,  x = 5,  x = — I 

6.  - x - 2y  - 3z  = 0 
w+  x + 4y  + 4z  = 7 
w + 3x  + ly  + 9z  = 4 

— w — 2x  — 4j  — 6z  = 6 

7.  3xi  + 5x2  = *1 

xi  + 2x2  = ^2 
Answer: 

xi  =2b\  — 5Z>2,  X2  = -*i  + 3Z?2 

8.  xi  + 2x2+  3x3  = b\ 

2xi + 5x2 +5x3  = *2 

3xi + 5x2+  2x3  = *3 

In  Exercises  9-12,  solve  the  linear  systems  together  by  reducing  the  appropriate  augmented  matrix. 

9.  xi  — 5x2  = &i 
3xi  + 2x2  = *2 

(i)  *1  = 1.  *2  = 4 

(ii)  *i  = -2,  b2  = 5 


Answer: 


(i) 

(ii) 


22  1 

Xl=rf’  X2=rf 


21 


11 


*i  = ir  X2  = W 


10.  -XI  +4^2+  *3  = *1 
xi  + 9x2  -2x3  = *2 

6xi +4x2—  8x3  = *3 

(i)  *1  = 0,  b 2 = 1,  i>3  = 0 

(ii)  b\  = - 3,  *2  = 4.  *3  = -5 

11.4xi  -7x2  = *i 

xi  + 2x2  = *2 

(i)  *i  = 0,  *2  = 1 

(ii)  b\  = -4,  *2  = 6 

(iii)  *1  = “ 1.  *2  = 3 

(iv)  b\  = -5,  *2  = 1 


Answer: 


(i) 

(ii) 

(iii) 

(iv) 


Xl=i-  *2=i5 

34  28 

*1  = - ,2  = - 


19 


13 


*1  = TT’  *2  = T5 


*1  = - y *2  = 3- 


12.  *i  + 3*2  + 5*3  = 

— *1  — 2x2  = *2 

2x\  + 5x2  + 4^3  = &3 

(i)  *1  = 1-  *2  = 0,  *3  = “ 1 

(ii)  *1  = 0,  *2=1,  *3=1 

(iii)  *1  = - 1,  *2  = - 1,  *3  = 0 


In  Exercises  13-17,  determine  conditions  on  the  b{ s,  if  any,  in  order  to  guarantee  that  the  linear  system  is  consistent. 

13.  Xi+3X2  = *1 
-2xi  + *2  = *2 


Answer: 


No  conditions  on  b\  and  &2 

14.  6x1  -Ax2  = b\ 

2x\  -2x2  =*2 

15.  xi -2x2  4- 5x3  = *1 

4x1-5x2  4-8x3  = b 2 

—3xi  + 3x2  — 3x3  = *3 

Answer: 

*3  = *1  — *2 

16.  x\  -2x2-  *3  = *1 

— 4xi  + 5x2  + 2x3  = *2 
—4xi  + 7x2 + 4x3  = *3 

17.  xi  - X2  + 3x3  + 2x4  = *1 

— 2xi  + X2  + 5x3  + X4  = *2 

— 3xi  + 2x2  + 2x3  — X4  = *3 

4xi  — 3x2  + X3  + 3x4  = *4 

Answer: 

*1  = *3  + *4,  *2  = 2*3 +*4 


18.  Consider  the  matrices 


'2  1 2" 

'*l" 

2 2-2 

and  x = 

*2 

3 1 1 

*3 

(a)  Show  that  the  equation  Ax  = x can  be  rewritten  as  (A  — l)x  = 0 and  use  this  result  to  solve  Ax  = x for  x- 

(b)  Solve  Ax  = 4x- 

In  Exercises  19-20,  solve  the  given  matrix  equation  forX 


19. 

'i  -i  r 

"2  -1  5 7 8" 

2 3 0 

0 2-1 

X = 

4 0-301 

3 5 -7  2 1 

Answer: 


11  12 

3 

27 

26 

X = 

-6  - 

8 

1 

-18 

-17 

-15  -21 

( 

-38 

-35 

'-2 

0 r 

"4 

3 2 

f 

0 

-1  -1 

X 

= 

6 

00 

9 

1 

1 -4 

1 

3 7 

9 

21.  Let  Ax  = 0 be  a homogeneous  system  of  n linear  equations  in  n unknowns  that  has  only  the  trivial  solution.  Show  that  if  k is  any  positive 
integer,  then  the  system  A^x  — 0 also  has  only  the  trivial  solution. 

22.  Let  Ax  = 0 be  a homogeneous  system  of  n linear  equations  in  n unknowns,  and  let  Q be  an  invertible  nxn  matrix.  Show  that  ^dx  = 0 has  just 
the  trivial  solution  if  and  only  if  (g^4)x  = 0 has  just  the  trivial  solution. 

23.  Let  Ax  = b be  any  consistent  system  of  linear  equations,  and  let  xi  be  a fixed  solution.  Show  that  every  solution  to  the  system  can  be  written  in 
the  form  x = xi  + xq,  where  xo  is  a solution  to  Ax  = 0-  Show  also  that  every  matrix  of  this  form  is  a solution. 

24.  Use  part  (a)  of  Theorem  1.6.3  to  prove  part  (b). 

True-False  Exercises 

In  parts  (a)-(g)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  It  is  impossible  for  a linear  system  of  linear  equations  to  have  exactly  two  solutions. 

Answer: 

True 

(b)  If  the  linear  system  Ax  = b has  a unique  solution,  then  the  linear  system  Ax  = c also  must  have  a unique  solution. 

Answer: 

True 

(c)  If  A and  B are  « x n matrices  such  that  AB  = ln,  then  BA  = 

Answer: 

True 

(d)  If  A and  B are  row  equivalent  matrices,  then  the  linear  systems  Ax  = 0 and  fix  = 0 have  the  same  solution  set. 

Answer: 

True 

(e)  If  A is  an  ^ x n matrix  and  S is  an  ^ x n invertible  matrix,  then  if  x is  a solution  to  the  linear  system  (£-1  AS)x  = b,  then  fix  is  a solution  to  the 
linear  system  Ay  = fib. 

Answer: 


True 


(f)  Let  A be  an  « x « matrix.  The  linear  system  = 4X  has  a unique  solution  if  and  only  if  A — 4 / is  an  invertible  matrix. 


Answer: 

True 

(g)  Let  A and  B be  ^ x n matrices.  If  A or  B (or  both)  are  not  invertible,  then  neither  is  AB. 
Answer: 

True 
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1.7  Diagonal,  Triangular,  and  Symmetric  Matrices 

In  this  section  we  will  discuss  matrices  that  have  various  special  forms.  These  matrices  arise  in  a wide  variety  of  applications 
and  will  also  play  an  important  role  in  our  subsequent  work. 


Diagonal  Matrices 


A square  matrix  in  which  all  the  entries  off  the  main  diagonal  are  zero  is  called  a diagonal  matrix.  Here  are  some  examples: 

'6  0 0 O' 


0 0 
0 0 


2 0 
0 -5 


1 0 0 
0 1 0 
0 0 1 


0-400 
0 0 0 0 

0 0 0 8 


A general  nxn  diagonal  matrix  D can  be  written  as 


D = 


d i 0 

0 d2 
0 0 


0 

0 

dn 


(1) 


A diagonal  matrix  is  invertible  if  and  only  if  all  of  its  diagonal  entries  are  nonzero;  in  this  case  the  inverse  of  1 is 

1/^1  0 ...  0 

0 \fd2  ...  o 


D~l  = 


0 


0 


Mdn 


(2) 


Confirm  Formula  2 by  showing  that 

DD~X  =D~XD  = 1 


Powers  of  diagonal  matrices  are  easy  to  compute;  we  leave  it  for  you  to  verify  that  if  D is  the  diagonal  matrix  1 and  k is  a 
positive  integer,  then 


Dk 


df  0 

0 d\ 


0 

0 


0 


(3) 


EXAMPLE  1 Inverses  and  Powers  of  Diagonal  Matrices 


if 


0 0 
-3  0 
0 2 


then 


'l 

0 

O' 

'l 

0 

0 ' 

0 

1 

0 

1 

0 

0 

0 

1 

0 

3 

. a5  = 

0 

-243 

0 

, a~5  = 

243 

0 

0 

i 

0 

0 

32 

0 

0 

_L 

2 

32 

Matrix  products  that  involve  diagonal  factors  are  especially  easy  to  compute.  For  example, 


'd  i 

0 

0 " 

'<*11 

al2 

<313 

<314' 

dm2 

d\a\i 

1 

5] 

0 

di 

0 

a2l 

a22 

a 22 

<*24 

= 

<*2021 

d 2<322 

d 2£?23 

to 

& 

to 

0 

0 

d 3 

<331 

a22 

<333 

<3  34 

y3<33i 

<*3032 

<2?  33 

d?3<234 

’<*11 

<*21 

(231 

012 

022 

032 

O 13" 
<*23 
a 33 

"Q*  ° 

1 

0 

d 2 

1 

O O 

(241 

<*42 

343 

0 

0 

^3_ 

d\a\\  d^ayi  <^3^13 
d 1^21  ^2^22  ^3^23 
d 1^31  <^2^32  ^3^33 
d\ctt\\  d 2^42  ^3^43 


In  words,  to  multiply  a matrix  A on  the  left  by  a diagonal  matrix  D,  one  can  multiply  successive  rows  of  A by  the 
successive  diagonal  entries  of  D,  and  to  multiply  A on  the  right  by  D,  one  can  multiply  successive  columns  of  A by  the 
successive  diagonal  entries  of  D. 


Triangular  Matrices 

A square  matrix  in  which  all  the  entries  above  the  main  diagonal  are  zero  is  called  lower  triangular , and  a square  matrix  in 
which  all  the  entries  below  the  main  diagonal  are  zero  is  called  upper  triangular.  A matrix  that  is  either  upper  triangular  or 
lower  triangular  is  called  triangular. 

EXAMPLE  2 Upper  and  Lower  Triangular  Matrices 


on 

Ol2 

013 

014 

011 

0 

0 

0 

0 

022 

023 

024 

021 

022 

0 

0 

0 

0 

a 3 3 

034 

031 

a 32 

033 

0 

_0 

0 

0 

044. 

_041 

042 

O43 

044_ 

Observe  that  diagonal  matrices  are  both  upper  triangular  and  lower  triangular  since  they  have  zeros  below  and 
above  the  main  diagonal.  Observe  also  that  a square  matrix  in  row  echelon  form  is  upper  triangular  since  it  has  zeros  below 
the  main  diagonal. 


Properties  of  Triangular  Matrices 


Example  2 illustrates  the  following  four  facts  about  triangular  matrices  that  we  will  state  without  formal  proof. 

triangular  if  and  only  if  all  entries  to  the  left  of  the  main  diagonal  are  zero;  that  is, 

triangular  if  and  only  if  all  entries  to  the  right  of  the  main  diagonal  are  zero;  that  is, 

triangular  if  and  only  if  the  z'th  row  starts  with  at  least  j _ 1 zeros  for  every  i. 
triangular  if  and  only  if  the  jth  column  starts  with  at  least  j — 1 zeros  for  every  j. 

*<] 

*>j 

Figure  1.7.1 

The  following  theorem  lists  some  of  the  basic  properties  of  triangular  matrices. 


A square  matrix  A = [fly]  is  upper 
fly  = 0 if  i > j (Figure  1.7.1). 

A square  matrix  A = [a  y ] is  lower 
fly  = 0 if  j < j (Figure  1.7.1). 

A square  matrix  A = [fly]  is  upper 
A square  matrix  A = [<3y  ] is  lower 


THEOREM  1.7.1 

(a)  The  transpose  of  a lower  triangular  matrix  is  upper  triangular,  and  the  transpose  of  an  upper  triangular  matrix  is 
lower  triangular. 

(b)  The  product  of  lower  triangular  matrices  is  lower  triangular,  and  the  product  of  upper  triangular  matrices  is  upper 
triangular. 

(c)  A triangular  matrix  is  invertible  if  and  only  if  its  diagonal  entries  are  all  nonzero. 

(d)  The  inverse  of  an  invertible  lower  triangular  matrix  is  lower  triangular,  and  the  inverse  of  an  invertible  upper 
triangular  matrix  is  upper  triangular. 


Part  (a)  is  evident  from  the  fact  that  transposing  a square  matrix  can  be  accomplished  by  reflecting  the  entries  about  the  main 
diagonal;  we  omit  the  formal  proof.  We  will  prove  (b),  but  we  will  defer  the  proofs  of  (c)  and  (d)  to  the  next  chapter,  where 
we  will  have  the  tools  to  prove  those  results  more  efficiently. 

Proof  (b)  We  will  prove  the  result  for  lower  triangular  matrices;  the  proof  for  upper  triangular  matrices  is  similar.  Fet 
A = [ajj  ] and  B = [6y  ] be  lower  triangular  n x n matrices,  and  let  C = [cy  ] be  the  product  C = AB • We  can  prove  that  C 
is  lower  triangular  by  showing  that  c y = 0 for  i < j.  But  from  the  definition  of  matrix  multiplication, 

cy  = + ai2^2j  + ’ ' " + 

If  we  assume  that  i <j,  then  the  terms  in  this  expression  can  be  grouped  as  follows: 
cy  = + ’ " * "h  ^ 

Terms  in  which  the  row  Terms  in  which  the  row 

number  of  b is  less  than  the  number  of  a is  less  than 

column  number  of  b the  column  number  of  a 

In  the  first  grouping  all  of  the  b factors  are  zero  since  B is  lower  triangular,  and  in  the  second  grouping  all  of  the  a factors  are 
zero  since  A is  lower  triangular.  Thus,  c y = 0,  which  is  what  we  wanted  to  prove. 


EXAMPLE  3 Computations  with  Triangular  Matrices 


Consider  the  upper  triangular  matrices 


'1 

3 

-f 

'3 

-2 

2' 

0 

2 

4 

, B = 

0 

0 

-1 

0 

0 

5 

0 

0 

1 

It  follows  from  part  (c)  of  Theorem  1.7.1  that  the  matrix  A is  invertible  but  the  matrix  B is  not.  Moreover,  the 
theorem  also  tells  us  that  A ? AB,  and  BA  must  be  upper  triangular.  We  leave  it  for  you  to  confirm  these  three 


statements  by  showing  that 


1 


A-4 


0 


0 


3 

2 

1 

2 

0 


"3  -2  —2 

i 

7 

m 

on 

0 0 2 

II 

0 0-5 

0 0 5 

W-l 

o 

o 
1 

Symmetric  Matrices 

r 


DEFINITION  1 

A square  matrix  A is  said  to  be  symmetric  if  A = A 


J 


It  is  easy  to  recognize  a symmetric  matrix  by 
inspection:  The  entries  on  the  main  diagonal  have  no 
restrictions,  but  mirror  images  of  entries  across  the 
main  diagonal  must  be  equal.  Here  is  a picture  using 
the  second  matrix  in  Example  4: 


All  diagonal  matrices,  such  as  the  third  matrix  in 
Example  4,  obviously  have  this  property. 


EXAMPLE  4 Symmetric  Matrices 


The  following  matrices  are  symmetric,  since  each  is  equal  to  its  own  transpose  (verify). 


7 

-3 


1 4 5 

4-30, 
5 0 7 


d\  0 0 0 

0 d2  0 0 

0 0 d3  0 

0 0 0 d4 


It  follows  from  Formula  11  of  Section  1.3  that  a square  matrix  A = [tfjy]  is  symmetric  if  and  only  if 


(4) 


for  all  values  of  i and  j. 

The  following  theorem  lists  the  main  algebraic  properties  of  symmetric  matrices.  The  proofs  are  direct  consequences  of 
Theorem  1.4.8  and  are  omitted. 


THEOREM  1.7.2 

If  A and  B are  symmetric  matrices  with  the  same  size,  and  if  k is  any  scalar,  then: 

(a)  AT  is  symmetric. 

(b)  A + B and  A — B are  symmetric. 

(c)  kA  is  symmetric. 


It  is  not  true,  in  general,  that  the  product  of  symmetric  matrices  is  symmetric.  To  see  why  this  is  so,  let  A and  B be  symmetric 
matrices  with  the  same  size.  Then  it  follows  from  part  ( e ) of  Theorem  1.4.8  and  the  symmetry  of  A and  B that 

(AB)T  = BTAT  = BA 

T 

Thus,  (AB)  = AB  if  and  only  if  AB  = BA,  that  is,  if  and  only  if  A and  B commute.  In  summary,  we  have  the  following 
result. 


THEOREM  1.7.3 

The  product  of  two  symmetric  matrices  is  symmetric  if  and  only  if  the  matrices  commute. 


EXAMPLE  5 Products  of  Symmetric  Matrices 

The  first  of  the  following  equations  shows  a product  of  symmetric  matrices  that  is  not  symmetric,  and  the 
second  shows  a product  of  symmetric  matrices  that  is  symmetric.  We  conclude  that  the  factors  in  the  first 
equation  do  not  commute,  but  those  in  the  second  equation  do.  We  leave  it  for  you  to  verify  that  this  is  so. 


'1  2" 
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1 

1 

Csl 

I 

2 3_ 
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i 
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1 
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2 3_ 
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7 
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1 

CO 

Invertibility  of  Symmetric  Matrices 


In  general,  a symmetric  matrix  need  not  be  invertible.  For  example,  a diagonal  matrix  with  a zero  on  the  main  diagonal  is 


symmetric  but  not  invertible.  However,  the  following  theorem  shows  that  if  a symmetric  matrix  happens  to  be  invertible,  then 
its  inverse  must  also  be  symmetric. 


THEOREM  1.7.4 

If  A is  an  invertible  symmetric  matrix,  then  A is  symmetric. 


Assume  that  A is  symmetric  and  invertible.  From  Theorem  1.4.9  and  the  fact  that  A = A^,  we  have 
which  proves  that  A is  symmetric. 


Products  AAt  and  A1 A 


Matrix  products  of  the  form  AAT  and  ATA  arise  in  a variety  of  applications.  If  A is  an  m x n matrix,  then  AT  is  an  n x m 
matrix,  so  the  products  AAT  and  AT A are  both  square  matrices — the  matrix  AAT has  size  mxm->  and  the  matrix  ATA  has  size 
yi  x K-  Such  products  are  always  symmetric  since 


[aAt)T={at)TAt  = AAt  and  {ATA)T  = AT{AT)T  = ATA 


EXAMPLE  6 The  Product  of  a Matrix  and  Its  Transpose  Is  Symmetric 


Let  A be  the  2x3  matrix 


Then 

-2  4 

0 -5 

1 3 

-2  0 

4 -5 

T T 

Observe  that  A A and  AA  are  symmetric  as  expected. 


4 

-5 


10 

-2 

-11 

-2 

4 

-8 

-11 

-8 

41 

21 

—17" 

-17 

34 

T T 

Later  in  this  text,  we  will  obtain  general  conditions  on  A under  which  AA  and  A A are  invertible.  However,  in  the  special 
case  where  A is  square , we  have  the  following  result. 


THEOREM  1.7.5 

T T 

If  A is  an  invertible  matrix,  then  A A and  A A are  also  invertible. 


Since  A is  invertible,  so  is  AT  by  Theorem  1.4.9.  Thus  AAT  and  ATA  are  invertible,  since  they  are  the  products  of 
invertible  matrices. 


Concept  Review 

Diagonal  matrix 
Lower  triangular  matrix 
Upper  triangular  matrix 
Triangular  matrix 
Symmetric  matrix 

Skills 

Determine  whether  a diagonal  matrix  is  invertible  with  no  computations. 
Compute  matrix  products  involving  diagonal  matrices  by  inspection. 

Determine  whether  a matrix  is  triangular. 

Understand  how  the  transpose  operation  affects  diagonal  and  triangular  matrices. 
Understand  how  inversion  affects  diagonal  and  triangular  matrices. 

Determine  whether  a matrix  is  a symmetric  matrix. 


Exercise  Set  1.7 


In  Exercises  1^1,  determine  whether  the  given  matrix  is  invertible. 

1.  [2  O' 

0 -5 


Answer: 


2. 


3. 


4 0 0 
0 0 0 
0 0 5 

-1  0 0 

0 2 0 

0 0 | 


Answer: 


4. 


-10  0 

0 1 0 

0 0 3 

-10  0 0 
0 3 0 0 

0 0-3  0 

0 0 0 -2 


In  Exercises  5-8,  determine  the  product  by  inspection. 


5. 

"3  0 O' 

2 r 

0-10 

-4  1 

1 

o 

o 

2 5 

Answer: 


6 3 

4 -1 
4 10 


6. 


7. 


1 2 • 

-3  -1 

1 1 

m o 

1 i 

'-4  0 O' 
0 3 0 
0 0 2 

"5  0 0~ 

-3  2 0 4 

0 2 0 

1-530 

CO 

I 

o 

o 

-6  2 2 2 

-4 

3 

2 


Answer: 


'-15 

10 

0 

20 

-20] 

2 ■ 

-10 

6 

0 

6 

) 

18 

-6 

-6 

-6 

-6 

1 

8. 

'2  0 

O' 

4 

-1 

3' 

" 

-3 

0 

O' 

0 -1 

0 

1 

2 

0 

0 

5 

0 

O 

o 

4 

-5 

1 

-2 

0 

0 

2 

In  Exercises  9-12,  find  A and  A * (where  k is  any  integer)  by  inspection. 


A = 


1 0 
0 -2 


Answer: 


1 0 
0 4 


10.  -6  0 0 

A=  0 3 0 
0 0 5 


4 


1 0 
0 1 / (—2)k 


12.  [-2  0 0" 

0-4  0 0 

0 0-3  0 

0 0 0 2 

In  Exercises  13-19,  decide  whether  the  given  matrix  is  symmetric. 

13. 1" -8  -8" 

0. 


Answer: 


Not  symmetric 


14. 


Answer: 

Symmetric 

16.  [3  4 

_4  0 

17.  [0  1 2" 

1 5 -6 

2 6 6 

Answer: 

Not  symmetric 

is.  r -i  3" 

-1  5 1 

1 7 

19.ro  o r 
0 2 0 
3 0 0 


Answer: 


Not  symmetric 


In  Exercises  20-22,  decide  by  inspection  whether  the  given  matrix  is  invertible. 

20.  [-1  2 4" 

0 3 0 
0 0 5 

21. r0  1 -2  5' 

0 1 5 6 

0 0-31 
0 0 0 5 

Answer: 

Not  invertible 

22.  [ 2 0 0 O' 

-3-10  0 

-4-60  0 

0 3 8 -5 

In  Exercises  23-24,  find  all  values  of  the  unknown  constant(s)  in  order  for  A to  be  symmetric. 


Answer: 


a = — 8 

24.  2 a — 2b  + 2c  2a  4-  b 4-  c 

A=  3 5 a +c 

_0  -2  7 

In  Exercises  25-26,  find  all  values  of  x in  order  for  A to  be  invertible. 


A=  0 x + 2 x3 

0 0 x-4_ 

Answer: 


x*l,  -2,4 


In  Exercises  27-28,  find  a diagonal  matrix  A that  satisfies  the  given  condition. 

27.  [1  0 0 
A5=  0-1  0 

0 0-1 


Answer: 


28. 


1 0 0 
0-1  0 
0 0-1 


[9  0 0" 

A ~2=  0 4 0 
0 0 1 

29.  Verify  Theorem  1.1.1(b)  for  the  product^?,  where 

'-1  2 5" 

A=  01  3,5 

0 0 -4_ 

30.  Verify  Theorem  1.1.1(d)  for  the  matrices  A and  B in  Exercise  29. 

31.  Verify  Theorem  1.7.4  for  the  given  matrix^. 


(b)  r i -2  3_ 

A = -2  1 -7 

3-7  4 

32.  Let  A be  an  n x n symmetric  matrix. 

(a)  Show  that^42  is  symmetric. 

(b)  Show  that  2 A 2 — 3 A |-  / is  symmetric. 

33.  Prove:  If  A ^ A = A>  then  A is  symmetric  and  A = A^- 

34.  Find  all  3 x 3 diagonal  matrices  A that  satisfy  A^  — 3A  — 41  = 0- 

35.  Let  A = [fljj  ] be  an  n x n matrix.  Determine  whether  A is  symmetric. 

(a)  aij  = i2  +j2 

(b)  aij=i2-j 2 

(c)  ciij  = 2i  + 2j 

(d)  ay  = 2iA  + 2j3 

Answer: 

(a)  Yes 

(b)  No  (unless  n = 1) 

(c)  Yes 

(d)  No  (unless  n = 1) 

36.  On  the  basis  of  your  experience  with  Exercise  35,  devise  a general  test  that  can  be  applied  to  a formula  for  ay  to  determine 
whether  *4=  [<32y]  is  symmetric. 

37.  A square  matrix  A is  called  skew-symmetric  if  A^  = — A- 
Prove: 

(a)  If ^4  is  an  invertible  skew- symmetric  matrix,  then  A~^  is  skew- symmetric. 

(b)  If  A and  B are  skew-symmetric  matrices,  then  so  are  A , A+  B,  A — B,  and  kA  for  any  scalar  k. 


'2  -8  O' 
= 0 2 1 
0 0 3 


(c)  Every  square  matrix  A can  be  expressed  as  the  sum  of  a symmetric  matrix  and  a skew- symmetric  matrix.  [Hint:  Note 
the  identity  A = ^ (a  + A rJ  + ^ (a  - A rJ.] 

In  Exercises  38-39,  fill  in  the  missing  entries  (marked  with  x)  to  produce  a skew-symmetric  matrix. 


38. 

1 

X 

X 

4^ 

A = 

0 x x 

X —1  X 

39. 

1 

X 

O 

X 

A = 

X X -4 

CO 

X 

X 

Answer: 

"0  0 -8' 

0 0-4 
8 4 0 

2a  - 5b  + 5c 
5a  — 8b  + 6c 

d 


40.  Find  all  values  of  a,  b,  c,  and  d for  which  A is  skew- symmetric. 

0 2a  — 3b  + c 


A = 


-2 

-3 


0 

-5 


41.  We  showed  in  the  text  that  the  product  of  symmetric  matrices  is  symmetric  if  and  only  if  the  matrices  commute.  Is  the 
product  of  commuting  skew-symmetric  matrices  skew-  symmetric?  Explain.  [Note:  See  Exercise  37  for  the  deffinition  of 

skew-symmetric.] 

42.  If  the  n x n matrix  A can  be  expressed  as  A = LU,  where  L is  a lower  triangular  matrix  and  U is  an  upper  triangular 
matrix,  then  the  linear  system  Ax  = b can  be  expressed  as  LUx  = b and  can  be  solved  in  two  steps: 

Step  1.  Let  Ux  = y,  so  that  LUx  — b can  be  expressed  as  Ly  = b.  Solve  this  system. 

Step  2.  Solve  the  system  Ux  = y for  x. 

In  each  part,  use  this  two-step  method  to  solve  the  given  system. 


(a) 

1 

o 

o 

"2  -l  3:1 

"*1~ 

r 

-2  3 0 

0 1 2 

x2 

= 

-2 

2 4 1 

i 

^r 

o 

o 

*3 

0 

(b) 

o 

o 

CM 

"3  -5  2' 

"*l" 

A 

4 1 0 

-3  -2  3 

0 4 1 

0 0 2 

*2 

/3_ 

— 

-5 

2 

43.  Find  an  upper  triangular  matrix  that  satisfies 


30 

-8 


Answer: 


A = 


1 

0 


10 

-2 


True-False  Exercises 


In  parts  (a)-(m)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 


(a)  The  transpose  of  a diagonal  matrix  is  a diagonal  matrix. 


Answer: 


True 

(b)  The  transpose  of  an  upper  triangular  matrix  is  an  upper  triangular  matrix. 

Answer: 

False 

(c)  The  sum  of  an  upper  triangular  matrix  and  a lower  triangular  matrix  is  a diagonal  matrix. 

Answer: 

False 

(d)  All  entries  of  a symmetric  matrix  are  determined  by  the  entries  occurring  on  and  above  the  main  diagonal. 
Answer: 

True 

(e)  All  entries  of  an  upper  triangular  matrix  are  determined  by  the  entries  occurring  on  and  above  the  main  diagonal. 
Answer: 

True 

(f)  The  inverse  of  an  invertible  lower  triangular  matrix  is  an  upper  triangular  matrix. 

Answer: 

False 

(g)  A diagonal  matrix  is  invertible  if  and  only  if  all  of  its  diagonal  entries  are  positive. 

Answer: 

False 

(h)  The  sum  of  a diagonal  matrix  and  a lower  triangular  matrix  is  a lower  triangular  matrix. 

Answer: 

True 

(')  A matrix  that  is  both  symmetric  and  upper  triangular  must  be  a diagonal  matrix. 

Answer: 

True 

(j)  If  A and  B are  ^ x n matrices  such  that  A 4-  B is  symmetric,  then  A and  B are  symmetric. 

Answer: 

False 

(k)  If  A and  B are  ^ x n matrices  such  that  A 4-  B is  upper  triangular,  then  A and  B are  upper  triangular. 

Answer: 

False 

(i)  If  A2  is  a symmetric  matrix,  then  A is  a symmetric  matrix. 


Answer: 


False 

(m)  If  kA  is  a symmetric  matrix  for  some  t ^ 0?  then  A is  a symmetric  matrix. 

Answer: 

True 
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1.8  Applications  of  Linear  Systems 

In  this  section  we  will  discuss  some  relatively  brief  applications  of  linear  systems.  These  are  but  a small  sample  of  the  wide 
variety  of  real-world  problems  to  which  our  study  of  linear  systems  is  applicable. 


Network  Analysis 

The  concept  of  a network  appears  in  a variety  of  applications.  Loosely  stated,  a network  is  a set  of  branches  through  which 
something  “flows.”  For  example,  the  branches  might  be  electrical  wires  through  which  electricity  flows,  pipes  through 
which  water  or  oil  flows,  traffic  lanes  through  which  vehicular  traffic  flows,  or  economic  linkages  through  which  money 
flows,  to  name  a few  possibilities. 

In  most  networks,  the  branches  meet  at  points,  called  nodes  or  junctions , where  the  flow  divides.  For  example,  in  an 
electrical  network,  nodes  occur  where  three  or  more  wires  join,  in  a traffic  network  they  occur  at  street  intersections,  and  in 
a financial  network  they  occur  at  banking  centers  where  incoming  money  is  distributed  to  individuals  or  other  institutions. 

In  the  study  of  networks,  there  is  generally  some  numerical  measure  of  the  rate  at  which  the  medium  flows  through  a 
branch.  For  example,  the  flow  rate  of  electricity  is  often  measured  in  amperes,  the  flow  rate  of  water  or  oil  in  gallons  per 
minute,  the  flow  rate  of  traffic  in  vehicles  per  hour,  and  the  flow  rate  of  European  currency  in  millions  of  Euros  per  day. 

We  will  restrict  our  attention  to  networks  in  which  there  is  flow  conservation  at  each  node,  by  which  we  mean  that  the  rate 
of flow  into  any  node  is  equal  to  the  rate  of flow  out  of  that  node.  This  ensures  that  the  flow  medium  does  not  build  up  at 
the  nodes  and  block  the  free  movement  of  the  medium  through  the  network. 

A common  problem  in  network  analysis  is  to  use  known  flow  rates  in  certain  branches  to  find  the  flow  rates  in  all  of  the 
branches.  Here  is  an  example. 

EXAMPLE  1 Network  Analysis  Using  Linear  Systems 

Figure  1.8.1  shows  a network  with  four  nodes  in  which  the  flow  rate  and  direction  of  flow  in  certain 
branches  are  known.  Find  the  flow  rates  and  directions  of  flow  in  the  remaining  branches. 


30 


As  illustrated  in  Figure  1.8.2,  we  have  assigned  arbitrary  directions  to  the  unknown  flow  rates 
x\,  X2 , and  *3.  We  need  not  be  concerned  if  some  of  the  directions  are  incorrect,  since  an  incorrect  direction 
will  be  signaled  by  a negative  value  for  the  flow  rate  when  we  solve  for  the  unknowns. 


30 


It  follows  from  the  conservation  of  flow  at  node  A that 

*1  +*2  = 30 

Similarly,  at  the  other  nodes  we  have 

*2  + *3  = 35  (node  B) 

*3  + 15  = 60  (nodeC) 

*i  + 15  = 55  (node  D) 

These  four  conditions  produce  the  linear  system 

*1  + *2  = 30 

7T2  + *3  = 35 
*3  = 45 
*1  =40 

which  we  can  now  try  to  solve  for  the  unknown  flow  rates.  In  this  particular  case  the  system  is  sufficiently 
simple  that  it  can  be  solved  by  inspection  (work  from  the  bottom  up).  We  leave  it  for  you  to  confirm  that  the 
solution  is 

*1=40,  *2  = — 10,  *3  = 45 

The  fact  that  *2  is  negative  tells  us  that  the  direction  assigned  to  that  flow  in  Figure  1.8.2  is  incorrect;  that  is, 
the  flow  in  that  branch  is  into  node  A. 


EXAMPLE  2 Design  of  Traffic  Patterns 

The  network  in  Figure  1.8.3  shows  a proposed  plan  for  the  traffic  flow  around  a new  park  that  will  house  the 
Liberty  Bell  in  Philadelphia,  Pennsylvania.  The  plan  calls  for  a computerized  traffic  light  at  the  north  exit  on 
Fifth  Street,  and  the  diagram  indicates  the  average  number  of  vehicles  per  hour  that  are  expected  to  flow  in 
and  out  of  the  streets  that  border  the  complex.  All  streets  are  one-way. 

How  many  vehicles  per  hour  should  the  traffic  light  let  through  to  ensure  that  the  average  number  of 
vehicles  per  hour  flowing  into  the  complex  is  the  same  as  the  average  number  of  vehicles  flowing  out? 

Assuming  that  the  traffic  light  has  been  set  to  balance  the  total  flow  in  and  out  of  the  complex,  what  can 
you  say  about  the  average  number  of  vehicles  per  hour  that  will  flow  along  the  streets  that  border  the 
complex? 
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Figure  1.8.3 


Solution 

If,  as  indicated  in  Figure  1.8.36  we  let  x denote  the  number  of  vehicles  per  hour  that  the  traffic  light  must 
let  through,  then  the  total  number  of  vehicles  per  hour  that  flow  in  and  out  of  the  complex  will  be 

Flowing  in:  500  4=  400  4-  600  + 200  = 1700 
Flowing  out:  * 4~  700  4=  400 

Equating  the  flows  in  and  out  shows  that  the  traffic  light  should  let  x = 600  vehicles  per  hour  pass 
through. 

To  avoid  traffic  congestion,  the  flow  in  must  equal  the  flow  out  at  each  intersection.  For  this  to  happen, 
the  following  conditions  must  be  satisfied: 


Intersection 

Flow  In 

Flow  Out 

A 

400  4-  600 

= *l+*2 

B 

*2  4- *3 

= 400  4- * 

C 

500  4-  200 

= *3  + *4 

D 

x\  4**4 

700 

Thus,  with  x = 600?  as  computed  in  part  (a),  we  obtain  the  following  linear  system: 

*1  + *2  = 1000 

*2  4=  *3  = 1000 

*3  4=  *4=  ^00 

*1  4-  *4  = 700 

We  leave  it  for  you  to  show  that  the  system  has  infinitely  many  solutions  and  that  these  are  given  by  the 
parametric  equations 


*1  = 700  — t,  *2  = 300  “M,  *3  = 700  — *4  = t (l) 

However,  the  parameter  t is  not  completely  arbitrary  here,  since  there  are  physical  constraints  to  be 
considered.  For  example,  the  average  flow  rates  must  be  nonnegative  since  we  have  assumed  the  streets 
to  be  one-way,  and  a negative  flow  rate  would  indicate  a flow  in  the  wrong  direction.  This  being  the 
case,  we  see  from  1 that  t can  be  any  real  number  that  satisfies  0 < t < 700,  which  implies  that  the 
average  flow  rates  along  the  streets  will  fall  in  the  ranges 

0 < *1  < 700,  300  < *2  < 1000,  0 < *3  < 700,  0 < *4  < 700 


Electrical  Circuits 


Next,  we  will  show  how  network  analysis  can  be  used  to  analyze  electrical  circuits  consisting  of  batteries  and  resistors.  A 
battery  is  a source  of  electric  energy,  and  a resistor , such  as  a lightbulb,  is  an  element  that  dissipates  electric  energy.  Figure 
1.8.4  shows  a schematic  diagram  of  a circuit  with  one  battery  (represented  by  the  symbol  |j_),  one  resistor  (represented  by 

the  symbol  ^vyv-)>  and  a switch.  The  battery  has  a positive  pole  (+)  and  a negative  pole  (-).  When  the  switch  is  closed, 
electrical  current  is  considered  to  flow  from  the  positive  pole  of  the  battery,  through  the  resistor,  and  back  to  the  negative 
pole  (indicated  by  the  arrowhead  in  the  figure). 

► 


Switch 

Figure  1.8.4 

Electrical  current,  which  is  a flow  of  electrons  through  wires,  behaves  much  like  the  flow  of  water  through  pipes.  A battery 
acts  like  a pump  that  creates  “electrical  pressure”  to  increase  the  flow  rate  of  electrons,  and  a resistor  acts  like  a restriction 
in  a pipe  that  reduces  the  flow  rate  of  electrons.  The  technical  term  for  electrical  pressure  is  electrical  potential. ; it  is 
commonly  measured  in  volts  (V).  The  degree  to  which  a resistor  reduces  the  electrical  potential  is  called  its  resistance  and 
is  commonly  measured  in  ohms  (£1).  The  rate  of  flow  of  electrons  in  a wire  is  called  current  and  is  commonly  measured  in 
amperes  (also  called  amps)  (A).  The  precise  effect  of  a resistor  is  given  by  the  following  law: 

r n 


Ohm's  Law 

If  a current  of  / amperes  passes  through  a resistor  with  a resistance  of  R ohms,  then  there  is  a resulting  drop  of  E 
volts  in  electrical  potential  that  is  the  product  of  the  current  and  resistance;  that  is, 

E = 1R 


J 


A typical  electrical  network  will  have  multiple  batteries  and  resistors  joined  by  some  configuration  of  wires.  A point  at 
which  three  or  more  wires  in  a network  are  joined  is  called  a node  (or  junction  point).  A branch  is  a wire  connecting  two 
nodes,  and  a closed  loop  is  a succession  of  connected  branches  that  begin  and  end  at  the  same  node.  For  example,  the 
electrical  network  in  Figure  1.8.5  has  two  nodes  and  three  closed  loops — two  inner  loops  and  one  outer  loop.  As  current 
flows  through  an  electrical  network,  it  undergoes  increases  and  decreases  in  electrical  potential,  called  voltage  rises  and 
voltage  drops , respectively.  The  behavior  of  the  current  at  the  nodes  and  around  closed  loops  is  governed  by  two 
fundamental  laws: 


Figure  1.8.5 


Kirchhoffs  Current  Law 


The  sum  of  the  currents  flowing  into  any  node  is  equal  to  the  sum  of  the  currents  flowing  out. 


J 

n 


Kirchhoffs  Voltage  Law 

In  one  traversal  of  any  closed  loop,  the  sum  of  the  voltage  rises  equals  the  sum  of  the  voltage  drops. 


L J 

Kirchhoffs  current  law  is  a restatement  of  the  principle  of  flow  conservation  at  a node  that  was  stated  for  general  networks. 
Thus,  for  example,  the  currents  at  the  top  node  in  Figure  1.8.6  satisfy  the  equation  7 1 =72  + 73. 

— ► 

A 

Figure  1.8.6 


In  circuits  with  multiple  loops  and  batteries  there  is  usually  no  way  to  tell  in  advance  which  way  the  currents  are  flowing, 
so  the  usual  procedure  in  circuit  analysis  is  to  assign  arbitrary  directions  to  the  current  flows  in  the  branches  and  let  the 
mathematical  computations  determine  whether  the  assignments  are  correct.  In  addition  to  assigning  directions  to  the 
current  flows,  Kirchhoffs  voltage  law  requires  a direction  of  travel  for  each  closed  loop.  The  choice  is  arbitrary,  but  for 
consistency  we  will  always  take  this  direction  to  be  clockwise  (Figure  1.8.7).  We  also  make  the  following  conventions: 

A voltage  drop  occurs  at  a resistor  if  the  direction  assigned  to  the  current  through  the  resistor  is  the  same  as  the  direction 
assigned  to  the  loop,  and  a voltage  rise  occurs  at  a resistor  if  the  direction  assigned  to  the  current  through  the  resistor  is 
the  opposite  to  that  assigned  to  the  loop. 

A voltage  rise  occurs  at  a battery  if  the  direction  assigned  to  the  loop  is  from  - to  + through  the  battery,  and  a voltage 
drop  occurs  at  a battery  if  the  direction  assigned  to  the  loop  is  from  + to  - through  the  battery. 

If  you  follow  these  conventions  when  calculating  currents,  then  those  currents  whose  directions  were  assigned  correctly 
will  have  positive  values  and  those  whose  directions  were  assigned  incorrectly  will  have  negative  values. 
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Figure  1.8.7 


EXAMPLE  3 A Circuit  with  One  Closed  Loop 


Determine  the  current  / in  the  circuit  shown  in  Figure  1.8.8. 

/ 


Figure  1.8.8 

Since  the  direction  assigned  to  the  current  through  the  resistor  is  the  same  as  the  direction  of  the 
loop,  there  is  a voltage  drop  at  the  resistor.  By  Ohm’s  law  this  voltage  drop  is  E = [R=  3 /.  Also,  since  the 
direction  assigned  to  the  loop  is  from  - to  + through  the  battery,  there  is  a voltage  rise  of  6 volts  at  the 
battery.  Thus,  it  follows  from  Kirchhoff  s voltage  law  that 

3/  = 6 

from  which  we  conclude  that  the  current  is  / = 2 A-  Since  I is  positive,  the  direction  assigned  to  the  current 
flow  is  correct. 


EXAMPLE  4 A Circuit  with  Three  Closed  Loops 

Determine  the  currents  l\ , lj,  and  /j  in  the  circuit  shown  in  Figure  1.8.9. 


Figure  1.8.9 


Using  the  assigned  directions  for  the  currents,  Kirchhoff  s current  law  provides  one  equation  for 

each  node: 


Node  Current  In  Current  Out 

a h+h  = h 

b h h^h 

However,  these  equations  are  really  the  same,  since  both  can  be  expressed  as 


h + ^2 — h = 0 


(2) 


Gustav  Kirchhoff  (1824-1887) 

The  German  physicist  Gustav  Kirchhoff  was  a student  of  Gauss.  His  work  on 
Kirchhoff  s laws,  announced  in  1854,  was  a major  advance  in  the  calculation  of  currents,  voltages, 
and  resistances  of  electrical  circuits.  Kirchhoff  was  severely  disabled  and  spent  most  of  his  life  on 
crutches  or  in  a wheelchair. 
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To  find  unique  values  for  the  currents  we  will  need  two  more  equations,  which  we  will  obtain  from 
Kirchhoff  s voltage  law.  We  can  see  from  the  network  diagram  that  there  are  three  closed  loops,  a left  inner 
loop  containing  the  50  V battery,  a right  inner  loop  containing  the  30  V battery,  and  an  outer  loop  that 
contains  both  batteries.  Thus,  Kirchhoff  s voltage  law  will  actually  produce  three  equations.  With  a 
clockwise  traversal  of  the  loops,  the  voltage  rises  and  drops  in  these  loops  are  as  follows: 

Voltage  Rises  Voltage  Drops 
Left  Inside  Loop  50  5/i+20/3 

Right  Inside  Loop  30  + IO/2  + 2O/3  0 

Outside  Loop  30  + 50+  10/2  51 1 

These  conditions  can  be  rewritten  as 

5/i  +2O/3  = 50 

IO/2  + 2O/3  = —30  (3) 

5/1-IO/2  = 80 

However,  the  last  equation  is  superfluous,  since  it  is  the  difference  of  the  first  two.  Thus,  if  we  combine  2 
and  the  first  two  equations  in  3,  we  obtain  the  following  linear  system  of  three  equations  in  the  three 
unknown  currents: 

l\  + I2  — h — 0 

5/i  +2O/3  = 50 

IO/2  + 2O/3  = -30 

We  leave  it  for  you  to  solve  this  system  and  show  that  l\  = 6 A,  / 2 = — 5 A,  and  1 2 = 1 A.  The  fact  that  1 2 
is  negative  tells  us  that  the  direction  of  this  current  is  opposite  to  that  indicated  in  Figure  1.8.9. 


Balancing  Chemical  Equations 


Chemical  compounds  are  represented  by  chemical  formulas  that  describe  the  atomic  makeup  of  their  molecules.  For 
example,  water  is  composed  of  two  hydrogen  atoms  and  one  oxygen  atom,  so  its  chemical  formula  is  H2O;  and  stable 
oxygen  is  composed  of  two  oxygen  atoms,  so  its  chemical  formula  is  O2. 

When  chemical  compounds  are  combined  under  the  right  conditions,  the  atoms  in  their  molecules  rearrange  to  form  new 
compounds.  For  example,  when  methane  bums,  the  methane  (CH4)  and  stable  oxygen  (O2)  react  to  form  carbon  dioxide 
(CO2)  and  water  (H2O).  This  is  indicated  by  the  chemical  equation 

CH4  + 02  — ► CO2  + H2O  (4) 

The  molecules  to  the  left  of  the  arrow  are  called  the  reactants  and  those  to  the  right  the  products.  In  this  equation  the  plus 
signs  serve  to  separate  the  molecules  and  are  not  intended  as  algebraic  operations.  However,  this  equation  does  not  tell  the 
whole  story,  since  it  fails  to  account  for  the  proportions  of  molecules  required  for  a complete  reaction  (no  reactants  left 
over).  For  example,  we  can  see  from  the  right  side  of  4 that  to  produce  one  molecule  of  carbon  dioxide  and  one  molecule 
of  water,  one  needs  three  oxygen  atoms  for  each  carbon  atom.  However,  from  the  left  side  of  4 we  see  that  one  molecule  of 
methane  and  one  molecule  of  stable  oxygen  have  only  two  oxygen  atoms  for  each  carbon  atom.  Thus,  on  the  reactant  side 
the  ratio  of  methane  to  stable  oxygen  cannot  be  one-to-one  in  a complete  reaction. 

A chemical  equation  is  said  to  be  balanced  if  for  each  type  of  atom  in  the  reaction,  the  same  number  of  atoms  appears  on 
each  side  of  the  arrow.  For  example,  the  balanced  version  of  Equation  4 is 

CH4  + 202  — C02  + 2H20  (5) 

by  which  we  mean  that  one  methane  molecule  combines  with  two  stable  oxygen  molecules  to  produce  one  carbon  dioxide 
molecule  and  two  water  molecules.  In  theory,  one  could  multiply  this  equation  through  by  any  positive  integer.  For 
example,  multiplying  through  by  2 yields  the  balanced  chemical  equation 

2CH4  4-  402  — 2C02  + 4H20 

However,  the  standard  convention  is  to  use  the  smallest  positive  integers  that  will  balance  the  equation. 

Equation  4 is  sufficiently  simple  that  it  could  have  been  balanced  by  trial  and  error,  but  for  more  complicated  chemical 
equations  we  will  need  a systematic  method.  There  are  various  methods  that  can  be  used,  but  we  will  give  one  that  uses 
systems  of  linear  equations.  To  illustrate  the  method  let  us  reexamine  Equation  4.  To  balance  this  equation  we  must  find 
positive  integers,  x\,  x2,  *3,  and  x4  such  that 

X!(CH4)  +X2(02)  — >X3(C02)  + x4(H20)  (6) 

For  each  of  the  atoms  in  the  equation,  the  number  of  atoms  on  the  left  must  be  equal  to  the  number  of  atoms  on  the  right. 
Expressing  this  in  tabular  form  we  have 


Left  Side 

Right  Side 

Carbon 

*i 

*3 

Hydrogen 

4xi  = 

2x4 

Oxygen 

2x2 

2x3  + X4 

from  which  we  obtain  the  homogeneous  linear  system 

x\  — *3  =0 

4xj  — 2x4=0 

2x2  — 2x3  “ x4  = 0 


The  augmented  matrix  for  this  system  is 


10-1  0 0 
40  0-20 

_0  2 -2  -1  0_ 

We  leave  it  for  you  to  show  that  the  reduced  row  echelon  form  of  this  matrix  is 

100-^0 
010  -10 
001-^0 

from  which  we  conclude  that  the  general  solution  of  the  system  is 

x\=t!2r  X2  = t,  X2  = tl2,  X4  = t 

where  t is  arbitrary.  The  smallest  positive  integer  values  for  the  unknowns  occur  when  we  let  t = 2,  so  the  equation  can  be 
balanced  by  letting  x\  = \,  *2  = 2,  7:3  = 1,  7:4=2.  This  agrees  with  our  earlier  conclusions,  since  substituting  these 
values  into  Equation  6 yields  Equation  5. 

EXAMPLE  5 Balancing  Chemical  Equations  Using  Linear  Systems 

Balance  the  chemical  equation 

HC1  + Na3P04  — H3PO4  + NaCl 
[hydrochloric  acid]  4-  [sodium phosphate]  — ► [phosphoric  acid]  -H  [sodium  chloride] 

Let  7: 1 , 7:3,  7:3 , and  *4  be  positive  integers  that  balance  the  equation 

xi(HCl)  +*2(Na3P04)  ^*3(H3P04)  + *4(NaCl)  (7) 

Equating  the  number  of  atoms  of  each  type  on  the  two  sides  yields 

It:  1 = 3t:3  Hydrogen(H) 

It:  1 = 1t:4  Chlorine  (Cl) 

37:2  = 1t:4  Sodium(Na) 

It:  2 = 1*3  Phosphorous (P) 

4*2  = 4*3  Oxygen(O) 

from  which  we  obtain  the  homogeneous  linear  system 

*1  — 37:3  =0 

*1  — *4  = 0 

3t:2  —7:4=0 

x2  “ ^3  =0 

47:2-47:3  =0 

We  leave  it  for  you  to  show  that  the  reduced  row  echelon  form  of  the  augmented  matrix  for  this  system  is 

"lOO  -1  0" 

0 1 0 0 

0 0 1 — i 0 

000  00 

000  00 

from  which  we  conclude  that  the  general  solution  of  the  system  is 


x\  =t,  X2  = t!37  X2  = tl3,  X4  = t 

where  t is  arbitrary.  To  obtain  the  smallest  positive  integers  that  balance  the  equation,  we  let  t = 3,  in  which 
case  we  obtain  x\  =3,  X2=  h *3=1,  and  x4  = 3.  Substituting  these  values  in  7 produces  the  balanced 
equation 

3HC1  + Na3P04  — H3PO4  + 3NaCl 


Polynomial  Interpolation 

An  important  problem  in  various  applications  is  to  find  a polynomial  whose  graph  passes  through  a specified  set  of  points 
in  the  plane;  this  is  called  an  interpolating  polynomial  for  the  points.  The  simplest  example  of  such  a problem  is  to  find  a 
linear  polynomial 


p(x)  =ax  -hb  (8) 

whose  graph  passes  through  two  known  distinct  points,  (*  j ? y ^ ) and  (*2?  y 1 ) , in  the  xy-plane  (Figure  1.8.10).  You  have 
probably  encountered  various  methods  in  analytic  geometry  for  finding  the  equation  of  a line  through  two  points,  but  here 
we  will  give  a method  based  on  linear  systems  that  can  be  adapted  to  general  polynomial  interpolation. 

ny 

y = ax  + b 

U2,  V:) 

x 



Figure  1.8.10 


The  graph  of  8 is  the  line  y = ax  4=  b,  and  for  this  line  to  pass  through  the  points  j , y \ ) and  (^2,  y 2)  > we  must  have 

y\=ax\^b  and  y2=ax2a¥b 

Therefore,  the  unknown  coefficients  a and  b can  be  obtained  by  solving  the  linear  system 

ax\  +b  =7! 
ax  2 + 6 = j2 

We  don't  need  any  fancy  methods  to  solve  this  system — the  value  of  a can  be  obtained  by  subtracting  the  equations  to 
eliminate  b , and  then  the  value  of  a can  be  substituted  into  either  equation  to  find  b.  We  leave  it  as  an  exercise  for  you  to 
find  a and  b and  then  show  that  they  can  be  expressed  in  the  form 

« = xizizi.  ^ b=ym=ym 

x2-xi  x2-xi 


provided  x\  * *2-  Thus,  for  example,  the  line  y = ax  \ b that  passes  through  the  points 

(2,1)  and  (5,4) 

can  be  obtained  by  taking  (xj,  7 1)  = (2,  1)  and  (x3,  72)  = (5,  4)>  h1  which  case  9 yields 


a = 


4- 1 

5- 2 


1 and  b ■ 


(1)(5)  — (4)(2) 
5-2 


Therefore,  the  equation  of  the  line  is 


y =x  — 1 


(Figure  1.8.11). 


y 


<5.  4) 


(2.1) 


Figure  1.8.11 


Now  let  us  consider  the  more  general  problem  of  finding  a polynomial  whose  graph  passes  through  n points  with  distinct 
x-coordinates 


Since  there  are  n conditions  to  be  satisfied,  intuition  suggests  that  we  should  begin  by  looking  for  a polynomial  of  the  form 


since  a polynomial  of  this  form  has  n coefficients  that  are  at  our  disposal  to  satisfy  the  n conditions.  However,  we  want  to 
allow  for  cases  where  the  points  may  lie  on  a line  or  have  some  other  configuration  that  would  make  it  possible  to  use  a 
polynomial  whose  degree  is  less  than  n _ ] ; thus,  we  allow  for  the  possibility  that  1 and  other  coefficients  in  1 1 may 
be  zero. 

The  following  theorem,  which  we  will  prove  later  in  the  text,  is  the  basic  result  on  polynomial  interpolation. 


Polynomial  Interpolation 

Given  any  n points  in  the  xy-plane  that  have  distinct  x-coordinates,  there  is  a unique  polynomial  of  degree  n — 1 
or  less  whose  graph  passes  through  those  points. 


Let  us  now  consider  how  we  might  go  about  finding  the  interpolating  polynomial  1 1 whose  graph  passes  through  the  points 
in  10.  Since  the  graph  of  this  polynomial  is  the  graph  of  the  equation 


(*1.  7l),  (*2.  72),  (*3.  73),  -,  (*m,  7m) 


(10) 


(11) 


(12) 


it  follows  that  the  coordinates  of  the  points  must  satisfy 


(13) 


In  these  equations  the  values  of  x's  and  / s are  assumed  to  be  known,  so  we  can  view  this  as  a linear  system  in  the 
unknowns  ag,  a \ , . . an-\ . From  this  point  of  view  the  augmented  matrix  for  the  system  is 


(14) 


1 XI  x\  ...  x”  1 y 1 

1 x2  x2  — ^2 _ 1 ^2 

1 xn  xn  ...  xM  7> 5 

and  hence  the  interpolating  polynomial  can  be  found  by  reducing  this  matrix  to  reduced  row  echelon  form  (Gauss-Jordan 
elimination). 

EXAMPLE  6 Polynomial  Interpolation  by  Gauss-Jordan  Elimination 

Find  a cubic  polynomial  whose  graph  passes  through  the  points 

0.3).  (2.  -2).  (3.  -5).  (4.0) 

Since  there  are  four  points,  we  will  use  an  interpolating  polynomial  of  degree  « = 3-  Denote  this 
polynomial  by 

2 3 

p(x)  =aQ  + a\x  4-  a^x  4- a 3* 
and  denote  the  x-  and  y-coordinates  of  the  given  points  by 

*1  = 1,  x2  = 2,  *3  = 3,  x4  = 4 and  y\  = 3,  y 2=  - 2,  73  = - 5,  74  = 0 

Thus,  it  follows  from  14  that  the  augmented  matrix  for  the  linear  system  in  the  unknowns  ctQ,  a\,  132,  and  «3 
is 


1 „ 2 3 

1 x\  x i y\ 

1 -r  3 

1 X2  *2  *2  y 2 

'1111  3 

1 2 4 8 -2 

1 *3  x|  X3  73 

1 3 9 27  -5 

1 x4  x4  jtJ  74 

1 4 16  64  0 

We  leave  it  for  you  to  confirm  that  the  reduced  row  echelon  form  of  this  matrix  is 

1 0 0 0 4 

0 10  0 3 

0 0 10-5 

0 0 0 1 1 

from  which  it  follows  that  a q = 4,  a\  = 3,  ctj  — — 5, 133  = 1 . Thus,  the  interpolating  polynomial  is 

p(x)=4  + 3x-5x2  + x2 

The  graph  of  this  polynomial  and  the  given  points  are  shown  in  Figure  1.8.12. 


Figure  1.8.12 


Later  we  will  give  a more  efficient  method  for  finding  interpolating  polynomials  that  is  better  suited  for 
problems  in  which  the  number  of  data  points  is  large. 

CALCULUS  AND  CALCULATING  UTILITY  REQUIRED 

EXAMPLE  7 Approximate  Integration 

There  is  no  way  to  evaluate  the  integral 


directly  since  there  is  no  way  to  express  an  antiderivative  of  the  integrand  in  terms  of  elementary  functions. 
This  integral  could  be  approximated  by  Simpson's  rule  or  some  comparable  method,  but  an  alternative 
approach  is  to  approximate  the  integrand  by  an  interpolating  polynomial  and  integrate  the  approximating 
polynomial.  For  example,  let  us  consider  the  five  points 

*0  = 0,  xi  = 0.25,  *2  = 0.5,  *3  = 0.75,  *4=1 
that  divide  the  interval  [0,  1]  into  four  equally  spaced  subintervals.  The  values  of 

/ O)  = sin 

at  these  points  are  approximately 

/( 0)  = 0,  y (0.25)  = 0.098017,  / (0.5)  = 0.382683,  / (0.75)  = 0.77301,  /(1)  = 1 
The  interpolating  polynomial  is  (verify) 

p(x)  = 0.098796*  + 0.762356*2  + 2. 14429*3  - 2.00544*4  (15) 

and 

p{x)dx&  0.438501  (16) 

As  shown  in  Figure  1.8.13,  the  graphs  of / and p match  very  closely  over  the  interval  [0,  1],  so  the 
approximation  is  quite  good. 

i - 

0.5  - 


0.25  0.5  0.75  1 1.25 

pM 

sin  (7Tjt/2) 


Figure  1.8.13 


Concept  Review 

Network 

Branches 

Nodes 

Flow  conservation 

Electrical  circuits:  battery,  resistor,  poles  (positive  and  negative),  electrical  potential,  Ohm's  law,  Kirchhoff  s 
current  law,  Kirchhoff  s voltage  law 

Chemical  equations:  reactants,  products,  balanced  equation 
Interpolating  polynomial 

Skills 

Find  the  flow  rates  and  directions  of  flow  in  branches  of  a network. 

Find  the  amount  of  current  flowing  through  parts  of  an  electrical  circuit. 

Write  a balanced  chemical  equation  for  a given  chemical  reaction. 

Find  an  interpolating  polynomial  for  a graph  passing  through  a given  collection  of  points. 


Exercise  Set  1 .8 

1.  The  accompanying  figure  shows  a network  in  which  the  flow  rate  and  direction  of  flow  in  certain  branches  are  known. 
Find  the  flow  rates  and  directions  of  flow  in  the  remaining  branches. 


50 


Answer: 


50 


2.  The  accompanying  figure  shows  known  flow  rates  of  hydrocarbons  into  and  out  of  a network  of  pipes  at  an  oil  refinery, 
(a)  Set  up  a linear  system  whose  solution  provides  the  unknown  flow  rates. 


(b)  Solve  the  system  for  the  unknown  flow  rates. 

(c)  Find  the  flow  rates  and  directions  of  flow  if  *4  = 50  and  x$  = 0. 

150 
*5 

200 
175 

Figure  Ex-2 

3.  The  accompanying  figure  shows  a network  of  one-way  streets  with  traffic  flowing  in  the  directions  indicated.  The  flow 
rates  along  the  streets  are  measured  as  the  average  number  of  vehicles  per  hour. 

(a)  Set  up  a linear  system  whose  solution  provides  the  unknown  flow  rates. 

(b)  Solve  the  system  for  the  unknown  flow  rates. 

(c)  If  the  flow  along  the  road  from  A to  B must  be  reduced  for  construction,  what  is  the  minimum  flow  that  is  required 
to  keep  traffic  flowing  on  all  roads? 


Answer: 


(a)  *3  — *4  = —500,  —7:1+7:4=100,  t:  1 — 7:2  = 300,  7:2  — 7:3  = 100 

(b)  = — 100  + X2=  —400  + £,  7:3  = — 500  + *,  7:4  = t 

(c)  For  all  rates  to  be  nonnegative,  we  need  t = 500  cars  per  hour,  so  7:1  = 400,  7:2  = 100,  7:3  = 0,  7:4  = 500 

4.  The  accompanying  figure  shows  a network  of  one-way  streets  with  traffic  flowing  in  the  directions  indicated.  The  flow 
rates  along  the  streets  are  measured  as  the  average  number  of  vehicles  per  hour. 

(a)  Set  up  a linear  system  whose  solution  provides  the  unknown  flow  rates. 

(b)  Solve  the  system  for  the  unknown  flow  rates. 

(c)  Is  it  possible  to  close  the  road  from  A to  B for  construction  and  keep  traffic  flowing  on  the  other  streets?  Explain. 
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Figure  Ex-4 


In  Exercises  5-8,  analyze  the  given  electrical  circuits  by  finding  the  unknown  currents. 


5. 


8 V 


20  U 

Answer: 

/i=/4  = /5  = /6  = 1a,  l2  = h = OA 


In  Exercises  9-12,  write  a balanced  equation  for  the  given  chemical  reaction. 

9#  C3H2  4-  O2  — ► CO2  4 H2O  (propane  combustion) 

Answer: 

xi  = 1,  X2  = 5,  X3  = 3,  and  *4  = 4;  the  balanced  equation  is  C3H2  4-  502  — ► 3C02  + 4H2O 
10.  — ► CO2  + C2H5OH  ( fermentation  of  sugar) 

H.  CH3COF  4 H20  - CH3COOH  4 HF 


Answer: 


xi  = X2  = *3  = *4  = t;  the  balanced  equation  is  CH3COF  4 H2O  — ► CH3COOH  -I  HF 

12.  CO2  4 H2O  — ► 4 O2  ( photosynthesis) 

13.  Find  the  quadratic  polynomial  whose  graph  passes  through  the  points  (1,  1),  (2,  2),  and  (3,  5). 
Answer: 

p(x)  = x2  — 2x  4 2 

14.  Find  the  quadratic  polynomial  whose  graph  passes  through  the  points  (0,  0),  (-1,  1),  and  (1,  1). 

15.  Find  the  cubic  polynomial  whose  graph  passes  through  the  points  (-1,  -1),  (0,  1),  (1,  3),  (4,  -1). 

Answer: 
p(x)  = 1 + 

16.  The  accompanying  figure  shows  the  graph  of  a cubic  polynomial.  Find  the  polynomial. 


Figure  Ex-16 

(a)  Find  an  equation  that  represents  the  family  of  all  second-degree  polynomials  that  pass  through  the  points  (0,  1)  and 
(1,2).  [Hint:  The  equation  will  involve  one  arbitrary  parameter  that  produces  the  members  of  the  family  when 
varied.] 

(b)  By  hand,  or  with  the  help  of  a graphing  utility,  sketch  four  curves  in  the  family. 

Answer: 

'y 

(a)  Using  a\  = k as  a parameter,  p(x)  = 1 -F  kx  4-  (1  — k)x  where  — 00  < k < 00  • 

(b)  The  graphs  for  k = 0,  1,  2,  and  3 are  shown. 
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18.  In  this  section  we  have  selected  only  a few  applications  of  linear  systems.  Using  the  Internet  as  a search  tool,  try  to  find 
some  more  real-world  applications  of  such  systems.  Select  one  that  is  of  interest  to  you,  and  write  a paragraph  about  it. 


True-False  Exercises 


In  parts  (a)-(e)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  In  any  network,  the  sum  of  the  flows  out  of  a node  must  equal  the  sum  of  the  flows  into  a node. 

Answer: 

True 

(b)  When  a current  passes  through  a resistor,  there  is  an  increase  in  the  electrical  potential  in  a circuit. 

Answer: 

False 

(c)  Kirchhoff  s current  law  states  that  the  sum  of  the  currents  flowing  into  a node  equals  the  sum  of  the  currents  flowing  out 
of  the  node. 

Answer: 

True 

(d)  A chemcial  equation  is  called  balanced  if  the  total  number  of  atoms  on  each  side  of  the  equation  is  the  same. 

Answer: 

False 

(e)  Given  any  n points  in  the  xy-plane,  there  is  a unique  polynomial  of  degree  n — \ or  less  whose  graph  passes  through 
those  points. 

Answer: 

False 
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1.9  Leontief  Input-Output  Models 

In  1973  the  economist  Wassily  Leontief  was  awarded  the  Nobel  prize  for  his  work  on  economic  modeling  in  which  he 
used  matrix  methods  to  study  the  relationships  between  different  sectors  in  an  economy.  In  this  section  we  will  discuss 
some  of  the  ideas  developed  by  Leontief. 


Inputs  and  Outputs  in  an  Economy 

One  way  to  analyze  an  economy  is  to  divide  it  into  sectors  and  study  how  the  sectors  interact  with  one  another.  For 
example,  a simple  economy  might  be  divided  into  three  sectors — manufacturing,  agriculture,  and  utilities.  Typically,  a 
sector  will  produce  certain  outputs  but  will  require  inputs  from  the  other  sectors  and  itself.  For  example,  the  agricultural 
sector  may  produce  wheat  as  an  output  but  will  require  inputs  of  farm  machinery  from  the  manufacturing  sector, 
electrical  power  from  the  utilities  sector,  and  food  from  its  own  sector  to  feed  its  workers.  Thus,  we  can  imagine  an 
economy  to  be  a network  in  which  inputs  and  outputs  flow  in  and  out  of  the  sectors;  the  study  of  such  flows  is  called 
input-output  analysis.  Inputs  and  outputs  are  commonly  measured  in  monetary  units  (dollars  or  millions  of  dollars,  for 
example)  but  other  units  of  measurement  are  also  possible. 

The  flows  between  sectors  of  a real  economy  are  not  always  obvious.  For  example,  in  World  War  II  the  United  States  had 
a demand  for  50,000  new  airplanes  that  required  the  construction  of  many  new  aluminum  manufacturing  plants.  This 
produced  an  unexpectedly  large  demand  for  certain  copper  electrical  components,  which  in  turn  produced  a copper 
shortage.  The  problem  was  eventually  resolved  by  using  silver  borrowed  from  Fort  Knox  as  a copper  substitute.  In  all 
likelihood  modem  input-output  analysis  would  have  anticipated  the  copper  shortage. 

Most  sectors  of  an  economy  will  produce  outputs,  but  there  may  exist  sectors  that  consume  outputs  without  producing 
anything  themselves  (the  consumer  market,  for  example).  Those  sectors  that  do  not  produce  outputs  are  called  open 
sectors.  Economies  with  no  open  sectors  are  called  closed  economies , and  economies  with  one  or  more  open  sectors  are 
called  open  economies  (Figure  1.9.1).  In  this  section  we  will  be  concerned  with  economies  with  one  open  sector,  and  our 
primary  goal  will  be  to  determine  the  output  levels  that  are  required  for  the  productive  sectors  to  sustain  themselves  and 
satisfy  the  demand  of  the  open  sector. 


Manufacturing  Agriculture 


Utilities 
Figure  1.9.1 


Leontief  Model  of  an  Open  Economy 


Let  us  consider  a simple  open  economy  with  one  open  sector  and  three  product-producing  sectors:  manufacturing, 
agriculture,  and  utilities.  Assume  that  inputs  and  outputs  are  measured  in  dollars  and  that  the  inputs  required  by  the 


productive  sectors  to  produce  one  dollar’s  worth  of  output  are  in  accordance  with  Table  1 . 


Table  1 


Income  Required  per  Dollar  Output 

Manufacturing 

Agriculture 

Utilities 

Manufacturing 

$0.50 

$0.10 

$0.10 

Provider 

Agriculture 

$0.20 

$0.50 

$0.30 

Utilities 

$0.10 

$0.30 

$0.40 

Wassily  Leontief  (1906-1999) 

It  is  somewhat  ironic  that  it  was  the  Russian-born  Wassily  Leontief  who  won  the  Nobel  prize 
in  1973  for  pioneering  the  modem  methods  for  analyzing  free-market  economies.  Leontief  was  a precocious 
student  who  entered  the  University  of  Leningrad  at  age  15.  Bothered  by  the  intellectual  restrictions  of  the  Soviet 
system,  he  was  put  in  jail  for  anti-Communist  activities,  after  which  he  headed  for  the  University  of  Berlin, 
receiving  his  Ph.D.  there  in  1928.  He  came  to  the  United  States  in  1931,  where  he  held  professorships  at  Harvard 
and  then  New  York  University. 

[Image:  © Bettmann/OCorbis ] 


Usually,  one  would  suppress  the  labeling  and  express  this  matrix  as 

( 0.5  0.1  0.1 
C = 1 


0.2 

0.1 


0.5 

0.3 


0.3 

0.4 


(i) 


This  is  called  the  consumption  matrix  (or  sometimes  the  technology  matrix ) for  the  economy.  The  column  vectors 


ci  = 


in  C list  the  inputs  required  by  the  manufacturing,  agricultural,  and  utilities  sectors,  respectively,  to  produce  $1.00  worth 
of  output.  These  are  called  the  consumption  vectors  of  the  sectors.  For  example,  c\  tells  us  that  to  produce  $1.00  worth  of 
output  the  manufacturing  sector  needs  $0.50  worth  of  manufacturing  output,  $0.20  worth  of  agricultural  output,  and 
$0.10  worth  of  utilities  output. 


0.5' 

'0.1' 

"or 

0.5 

. c2  = 

0.5 

. c3  — 

0.3 

0.1 

0.3 

0.4 

What  is  the  economic  significance  of  the  row  sums 
of  the  consumption  matrix? 


Continuing  with  the  above  example,  suppose  that  the  open  sector  wants  the  economy  to  supply  it  manufactured  goods, 
agricultural  products,  and  utilities  with  dollar  values: 

d i dollars  of  manufactured  goods 
d 2 dollars  of  agricultural  products 
d 3 dollars  of  utilities 

The  column  vector  d that  has  these  numbers  as  successive  components  is  called  the  outside  demand  vector.  Since  the 
product-producing  sectors  consume  some  of  their  own  output,  the  dollar  value  of  their  output  must  cover  their  own  needs 
plus  the  outside  demand.  Suppose  that  the  dollar  values  required  to  do  this  are 

x 1 dollars  of  manufactured  goods 
*2  dollars  of  agricultural  products 
*3  dollars  of  utilities 


The  column  vector  x that  has  these  numbers  as  successive  components  is  called  the  production  vector  for  the  economy. 
For  the  economy  with  consumption  matrix  1,  that  portion  of  the  production  vector  x that  will  be  consumed  by  the  three 
productive  sectors  is 
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The  vector  Cx  is  called  the  intermediate  demand  vector  for  the  economy.  Once  the  intermediate  demand  is  met,  the 
portion  of  the  production  that  is  left  to  satisfy  the  outside  demand  is  x — Cx-  Thus,  if  the  outside  demand  vector  is  d,  then 
x must  satisfy  the  equation 


X 

_ 

Cx 

= 

d 

Amount 

Intermediate 

Outside 

produced 

demand 

demand 

which  we  will  find  convenient  to  rewrite  as 


(1  — C)x  = d (2) 

The  matrix  / — O’  is  called  the  Leontief  matrix  and  2 is  called  the  Leontief  equation. 

EXAMPLE  1 Satisfying  Outside  Demand 

Consider  the  economy  described  in  Table  1.  Suppose  that  the  open  sector  has  a demand  for  $7900  worth  of 
manufacturing  products,  $3950  worth  of  agricultural  products,  and  $1975  worth  of  utilities. 

Can  the  economy  meet  this  demand? 

(b)  If  so,  find  a production  vector  x that  will  meet  it  exactly. 


The  consumption  matrix,  production  vector,  and  outside  demand  vector  are 


"0.5 

0.1 

O.f 

"*f 

"7900" 

c= 

0.2 

0.5 

0.3 

, X = 

*2 

, d = 

3950 

0.1 

0.3 

0.4 

*3 

1975 

To  meet  the  outside  demand,  the  vector  x must  satisfy  the  Leontief  equation  2,  so  the  problem  reduces  to 
solving  the  linear  system 


(4) 


T— ' 

o 

1 

o 

1 

m 

0 

1  

'*l' 

'7900' 

-0.2  0.5  -0.3 

*2 

3950 

-0.1  -0.3  0.6 

*3 

1975 

l-C  x d 

(if  consistent).  We  leave  it  for  you  to  show  that  the  reduced  row  echelon  form  of  the  augmented  matrix  for 
this  system  is 


1 0 0 

27,500' 

0 1 0 

33,750 

0 0 1 

24,750 

This  tells  us  that  4 is  consistent,  and  the  economy  can  satisfy  the  demand  of  the  open  sector  exactly  by 
producing  $27,500  worth  of  manufacturing  output,  $33,750  worth  of  agricultural  output,  and  $24,750 
worth  of  utilities  output. 


Productive  Open  Economies 


In  the  preceding  discussion  we  considered  an  open  economy  with  three  product-producing  sectors;  the  same  ideas  apply 
to  an  open  economy  with  n product-producing  sectors.  In  this  case,  the  consumption  matrix,  production  vector,  and 
outside  demand  vector  have  the  form 


C = 

"^11 

c2\ 

c\2  ’ 
c22  ' 

c\ n 
' • C2n 

, X = 

■*l' 

*2 

, d = 

d\ 

d2 

cn\ 

cn2 

cnn 

xn 

dn 

where  all  entries  are  nonnegative  and 

c * * 

y = the  monetary  value  of  the  output  of  the  z'th  sector  that  is  needed  by  the yth  sector  to  produce  one  unit  of  output 

*r  . 

2 = the  monetary  value  of  the  output  of  the  z'th  sector 
= the  monetary  value  of  the  output  of  the  z'th  sector  that  is  required  to  meet  the  demand  of  the  open  sector 


Note  that  the yth  column  vector  of  C contains  the  monetary  values  that  the yth  sector  requires  of  the  other 
sectors  to  produce  one  monetary  unit  of  output,  and  the  z'th  row  vector  of  C contains  the  monetary  values  required  of  the 
ith  sector  by  the  other  sectors  for  each  of  them  to  produce  one  monetary  unit  of  output. 


As  discussed  in  our  example  above,  a production  vector  x that  meets  the  demand  d of  the  outside  sector  must  satisfy  the 
Leontief  equation 

(/  — C)x  = d 

If  the  matrix  / _ C is  invertible,  then  this  equation  has  the  unique  solution 

X=(/-C)"1d  (5) 

for  every  demand  vector  d.  However,  for  x to  be  a valid  production  vector  it  must  have  nonnegative  entries,  so  the 
problem  of  importance  in  economics  is  to  determine  conditions  under  which  the  Leontief  equation  has  a solution  with 
nonnegative  entries. 

It  is  evident  from  the  form  of  5 that  if  / — C is  invertible,  and  if  (/  — C)  has  non-negative  entries,  then  for  every 


demand  vector  d the  corresponding  x will  also  have  non-negative  entries,  and  hence  will  be  a valid  production  vector  for 
the  economy.  Economies  for  which  (/  — C)  has  nonnegative  entries  are  said  to  be  productive.  Such  economies  are 

desirable  because  demand  can  always  be  met  by  some  level  of  production.  The  following  theorem,  whose  proof  can  be 
found  in  many  books  on  economics,  gives  conditions  under  which  open  economies  are  productive. 


THEOREM  1.9.1 

If  C is  the  consumption  matrix  for  an  open  economy,  and  if  all  of  the  column  sums  are  less  than  then  the  matrix 
l — C is  invertible,  the  entries  of  (/  — C)  are  nonnegative,  and  the  economy  is  productive. 


The  jth  column  sum  of  C represents  the  total  dollar  value  of  input  that  the  jth  sector  requires  to  produce  $1  of 
output,  so  if  the  jth  column  sum  is  less  than  1 , then  the  yth  sector  requires  less  than  $ 1 of  input  to  produce  $ 1 of  output;  in 
this  case  we  say  that  the yth  sector  is  profitable.  Thus,  Theorem  1.9.1  states  that  if  all  product-producing  sectors  of  an 
open  economy  are  profitable,  then  the  economy  is  productive.  In  the  exercises  we  will  ask  you  to  show  that  an  open 
economy  is  productive  if  all  of  the  row  sums  of  C are  less  than  1 (Exercise  11).  Thus,  an  open  economy  is  productive  if 
either  all  of  the  column  sums  or  all  of  the  row  sums  of  C are  less  than  1 . 


EXAMPLE  2 An  Open  Economy  Whose  Sectors  Are  All  Profitable 

The  column  sums  of  the  consumption  matrix  C in  1 are  less  than  1,  so  (/  — C)  -1  exists  and  has  nonnegative 
entries.  Use  a calculating  utility  to  confirm  this,  and  use  this  inverse  to  solve  Equation  4 in  Example  1. 


We  leave  it  for  you  to  show  that 


2.65823 

1.89873 

1.39241 


1.13924  1.01266 
3.67089  2.15190 
2.02532  2.91139 


This  matrix  has  nonnegative  entries,  and 


'2.65823  1.13924  1.01266' 

"7900' 

'27,  500' 

1.89873  3.67089  2.15190 

3950 

33,750 

1.39241  2.02532  2.91139 

1975 

24, 750 

which  is  consistent  with  the  solution  in  Example  1 . 


Concept  Review 

Sectors 
• Inputs 
Outputs 

Input-output  analysis 
Open  sector 

Economies:  open,  closed 


Consumption  (technology)  matrix 
Consumption  vector 
Outside  demand  vector 
Production  vector 
Intermediate  demand  vector 
Leontief  matrix 
Leontief  equation 

Skills 

Construct  a consumption  matrix  for  an  economy. 

Understand  the  relationships  among  the  vectors  of  a sector  of  an  economy:  consumption,  outside  demand, 
production,  and  intermediate  demand. 


Exercise  Set  1 .9 

1.  An  automobile  mechanic  (M)  and  a body  shop  ( B ) use  each  other’s  services.  For  each  $1 .00  of  business  that  M does,  it 
uses  $0.50  of  its  own  services  and  $0.25  of  B' s services,  and  for  each  $1.00  of  business  that  B does  it  uses  $0.10  of  its 
own  services  and  $0.25  of  M s services. 

(a)  Construct  a consumption  matrix  for  this  economy. 

(b)  How  much  must  M and  B each  produce  to  provide  customers  with  $7000  worth  of  mechanical  work  and  $14,000 
worth  of  body  work? 


Answer: 

(a) 

0.50  0.25' 

0.25  0.10 

(b) 

' S 25,  290] 

$ 22,  581 

2.  A simple  economy  produces  food  (F)  and  housing  ( H ).  The  production  of  $1.00  worth  of  food  requires  $0.30  worth  of 
food  and  $0.  10  worth  of  housing,  and  the  production  of  $1.00  worth  of  housing  requires  $0.20  worth  of  food  and 
$0.60  worth  of  housing. 

(a)  Construct  a consumption  matrix  for  this  economy. 

(b)  What  dollar  value  of  food  and  housing  must  be  produced  for  the  economy  to  provide  consumers  $130,000  worth 
of  food  and  $130,000  worth  of  housing? 

3.  Consider  the  open  economy  described  by  the  accompanying  table,  where  the  input  is  in  dollars  needed  for  $1.00  of 
output. 

(a)  Find  the  consumption  matrix  for  the  economy. 

(b)  Suppose  that  the  open  sector  has  a demand  for  $1930  worth  of  housing,  $3860  worth  of  food,  and  $5790  worth  of 
utilities.  Use  row  reduction  to  find  a production  vector  that  will  meet  this  demand  exactly. 

Table  Ex-3 

Income  Required  per  Dollar  Output 


Housing 

Food 

Utilities 

Housing 

$0.10 

$0.60 

$0.40 

Provider 

Food 

$0.30 

$0.20 

$0.30 

Utilities 

$0.40 

$0.10 

$0.20 

Answer: 


(a)  0.1  0.6  0.4 
0.3  0.2  0.3 
0.4  0.1  0.2 

(b)  f S 31,  500' 

S 26, 500 
$ 26, 300 

4.  A company  produces  Web  design,  software,  and  networking  services.  View  the  company  as  an  open  economy 
described  by  the  accompanying  table,  where  input  is  in  dollars  needed  for  $1.00  of  output. 

(a)  Find  the  consumption  matrix  for  the  company. 

(b)  Suppose  that  the  customers  (the  open  sector)  have  a demand  for  $5400  worth  of  Web  design,  $2700  worth  of 
software,  and  $900  worth  of  networking.  Use  row  reduction  to  find  a production  vector  that  will  meet  this  demand 
exactly. 


Table  Ex-4 


Income  Required  per  Dollar  Output 

Web  Design 

Software 

Networking 

Web  Design 

$0.40 

$0.20 

$0.45 

Provider 

Software 

$0.30 

$0.35 

$0.30 

Networking 

$0.15 

$0.10 

$0.20 

In  Exercises  5-6,  use  matrix  inversion  to  find  the  production  vector  x that  meets  the  demand  d for  the  consumption 
matrix  C. 


5. 


C = 


0.1 

0.5 


Answer: 


6. 


123.08 

202.56 

0.3 


C = 


0.3 


7.  Consider  an  open  economy  with  consumption  matrix 


1 

2 
0 


0 

1 


C = 


(a)  Showthat  the  economy  can  meet  a demand  of  d\  = 2 units  from  the  first  sector  and  = 0 units  from  the  second 
sector,  but  it  cannot  meet  a demand  of  d \ = 2 units  from  the  first  sector  and  d 2=1  unit  from  the  second  sector. 

(b)  Give  both  a mathematical  and  an  economic  explanation  of  the  result  in  part  (a). 


8.  Consider  an  open  economy  with  consumption  matrix 


III 

2 4 4 

.ill 

2 8 4 

III 

2 4 8 


If  the  open  sector  demands  the  same  dollar  value  from  each  product-producing  sector,  which  such  sector  must 
produce  the  greatest  dollar  value  to  meet  the  demand? 


9.  Consider  an  open  economy  with  consumption  matrix 


C = 


cil 

c2\ 


c\2 

0 


Show  that  the  Leontief  equation  x — Cx  = d has  a unique  solution  for  every  demand  vector  d if  C21c12  < 1 — • 

1®"  (a)  Consider  an  open  economy  with  a consumption  matrix  C whose  column  sums  are  less  than  1,  and  let  x be  the 

production  vector  that  satisfies  an  outside  demand  d;  that  is,  (/  — C)  d = x.  Let  dj  be  the  demand  vector  that  is 

obtained  by  increasing  the yth  entry  of  d by  1 and  leaving  the  other  entries  fixed.  Prove  that  the  production  vector 
x j that  meets  this  demand  is 

= x + yth  column  vector  of  (/  — C) 

(b)  In  words,  what  is  the  economic  significance  of  the  yth  column  vector  of  (/  — C)  _1?  [Hint:  Look  at xj  — x] 


11.  Prove:  If  C is  an  ^ x n matrix  whose  entries  are  nonnegative  and  whose  row  sums  are  less  than  1,  then  / _ Q is 
invertible  and  has  nonnegative  entries.  [Hint:  (a^  J = 1^4 j for  any  invertible  matrix^.] 


True-False  Exercises 


In  parts  (a)-(e)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  Sectors  of  an  economy  that  produce  outputs  are  called  open  sectors. 

Answer: 

False 

(b) A  closed  economy  is  an  economy  that  has  no  open  sectors. 

Answer: 

True 

(c)  The  rows  of  a consumption  matrix  represent  the  outputs  in  a sector  of  an  economy. 
Answer: 


False 


(d)  If  the  column  sums  of  the  consumption  matrix  are  all  less  than  1,  then  the  Leontif  matrix  is  invertible. 
Answer: 

True 

(e)  The  Leontif  equation  relates  the  production  vector  for  an  economy  to  the  outside  demand  vector. 
Answer: 

True 
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Supplementary  Exercises 


In  Exercises  1—4  the  given  matrix  represents  an  augmented  matrix  for  a linear  system.  Write  the 
corresponding  set  of  linear  equations  for  the  system,  and  use  Gaussian  elimination  to  solve  the  linear  system. 
Introduce  free  parameters  as  necessary. 


1. 


3-104  1 

2 033-1 


Answer: 


3xi  - x2 


2. 


3. 


2xi 

+ 3 
3 3, 

xi  = 

2S~2t~ 

1 

4 -f 

-2 

Csl 

00 

1 

3 

12  -3 

0 

0 0 

“ 

2 

-4  1 

-4 

0 


0 

1 -1 


4=  *4  = 1 
+ 3x4  = 

*2=— fs- 


* 


5 

2’ 


6 

3 -1 


Answer: 


2xi  — 4x2  + X3  = 6 
—4xi  + 3x3  = — 1 
x2  - x3  = 3 


4. 


3 1 -2 

-9  -3  6 

6 2 1 


x2=  - 


26 


x3=  - 


35 


5.  Use  Gauss-Jordan  elimination  to  solve  for  x'  and  y'  in  terms  of  x andy. 

x = lx'  - V 

* 5*  57 

y =!*'-§/ 


Answer: 


J _ 3 , 4 

x — 


/=-§*  + \y 


6.  Use  Gauss-Jordan  elimination  to  solve  forx'  andy'  in  terms  of x andy. 


x — x'cos  9 — y' sin  9 
y = x'svci9  — y'cos  9 

7.  Find  positive  integers  that  satisfy 

x + y+  z — 9 
x + 5y  + 1 Oz  = 44 

Answer: 

x=4,  y = 2,  z = 3 

8.  A box  containing  pennies,  nickels,  and  dimes  has  13  coins  with  a total  value  of  83  cents.  How  many  coins 
of  each  type  are  in  the  box? 

9.  Let 

a 0 b 2 

a a 4 4 

0 a 2 b 

be  the  augmented  matrix  for  a linear  system.  Find  for  what  values  of  a and  b the  system  has 

(a)  a unique  solution. 

(b)  a one-parameter  solution. 

(c)  a two-parameter  solution. 

(d)  no  solution. 

Answer: 

(a)  a * 0,  b * 2 

(b)  a*0,  b = 2 

(c)  a = 0,  b = 2 

(d)  a = C)> 

10.  For  which  value(s)  of  a does  the  following  system  have  zero  solutions?  One  solution?  Infinitely  many 
solutions? 

xi  +X2  + *3  = 4 

*3  = 2 

(a^  — 4 J*3  = a — 2 

11.  Find  a matrix  K such  that  AKB  = C given  that 


8 6 -6' 
C = 6-1  1 , 

-4  0 0 


Answer: 


K = 


0 2 
1 1 


12.  How  should  the  coefficients  a,  b,  and  c be  chosen  so  that  the  system 

ax  + by  — 3z  = — 3 
— 2x  — by+cz  = — 1 
ax  + 3y  — cz  = — 3 


has  the  solution  * = 1,  y = _ 1,  and  z = 2? 


13.  In  each  part,  solve  the  matrix  equation  for  A’. 


(a)  -1  0 1 

X 1 1 0 

3 -1 


1 2 0 
-3  1 5 


(b) 


X 


1 -1  2 

3 0 1 


-5  -1  0 
6-3  7 


(c) 


3 1 

-1  2 


\X  — X\ 


1 4 

2 0 


2 

5 


-2 

4 


Answer: 


(c) 

X = 


-1  3 -1 


6 0 

1J 

1 

-2' 

3 

1_ 

113 

160 

' 37 

37 

20 

46 

37 

37 

14.  Let  A be  a square  matrix. 

(a)  Show  that  {I  — A)  =/  + A + A2  I A1'ifA4  = 0- 

(b)  Show  that 

(I  - A)~l  = I * A + A2  + ...  + A” 

if^”+1  = 0. 


'y 

15.  Find  values  of  a,  b,  and  c such  that  the  graph  of  the  polynomial  p(x)  = ax  + bx  + c passes  through  the 
points  (1,  2),  (-1,  6),  and  (2,  3). 


Answer: 

a = 1,  b = — 2,  c = 3 

16.  (Calculus  required)  Find  values  of  a,  b.  and  c such  that  the  graph  of  the  polynomial 


9 

p(x)  = ax  +bx  + c passes  through  the  point  (-1,  0)  and  has  a horizontal  tangent  at  (2,  -9). 

17.  Let  Jn  be  the  nxn  matrix  each  of  whose  entries  is  1 . Show  that  if  n > 1 , then 

ft  — 1 

18.  Show  that  if  a square  matrix  A satisfies 

A3  + 4A2- 2,4  + 77=0 

then  so  does  ^4  7'. 

19.  Prove:  If  B is  invertible,  then  AB~^  = B~^A  if  and  only  if  AB  = BA- 

20.  Prove:  If  A is  invertible,  then  A ■+■  B and  / | BA~^  are  both  invertible  or  both  not  invertible. 


23.  (Calculus  required)  Use  part  (c)  of  Exercise  22  to  show  that 

dA_  — A — ^ ——~A  — ^ 

dx  dx 

State  all  the  assumptions  you  make  in  obtaining  this  formula. 

24.  Assuming  that  the  stated  inverses  exist,  prove  the  following  equalities. 


-1 


(a)  (c-1  + D-1)_1=C(C  + £))_1D 

(b)  ( l + CD) -1 C = C(1  + DC) _1 

(c)  (C  I DDTylD  = C~lDli!  + DTC~1D\ 
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Determinants 


CHAPTER  CONTENTS 

Determinants  by  Cofactor  Expansion 
Evaluating  Determinants  by  Row  Reduction 
Properties  of  Determinants;  Cramer's  Rule 


INTRODUCTION 

In  this  chapter  we  will  study  “determinants”  or,  more  precisely,  “determinant  functions.” 
Unlike  real-valued  functions,  such  as  / (x)  = x , that  assign  a real  number  to  a real 

variable  x,  determinant  functions  assign  a real  number  / (^4)  to  a matrix  variable  A. 
Although  determinants  first  arose  in  the  context  of  solving  systems  of  linear  equations, 
they  are  no  longer  used  for  that  purpose  in  real-world  applications.  Although  they  can  be 
useful  for  solving  very  small  linear  systems  (say  two  or  three  unknowns),  our  main 
interest  in  them  stems  from  the  fact  that  they  link  together  various  concepts  in  linear 
algebra  and  provide  a useful  formula  for  the  inverse  of  a matrix. 
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2.1  Determinants  by  Cofactor  Expansion 

In  this  section  we  will  define  the  notion  of  a “determinant.”  This  will  enable  us  to  give  a specific  formula  for  the  inverse  of  an 
invertible  matrix,  whereas  up  to  now  we  have  had  only  a computational  procedure  for  finding  it.  This,  in  turn,  will  eventually 
provide  us  with  a formula  for  solutions  of  certain  kinds  of  linear  systems. 

Recall  from  Theorem  1.4.5  that  the  2x2  matrix 


WARNING 


It  is  important  to  keep  in  mind  that  det(^4)  is  a number , 
whereas  A is  a matrix. 


is  invertible  if  and  only  if  ad  — be  =£  0 and  that  the  expression  ad  — be  is  called  the  determinant  of  the  matrix  A.  Recall  also 
that  this  determinant  is  denoted  by  writing 


det(^4)  = ad  — be  or 


a 

c 


b 

d 


= ad  — be 


(1) 


and  that  the  inverse  of  A can  be  expressed  in  terms  of  the  determinant  as 

a-\  _ 1 d —b 

det(il)  [-c  * 


(2) 


Minors  and  Cofactors 


One  of  our  main  goals  in  this  chapter  is  to  obtain  an  analog  of  Formula  2 that  is  applicable  to  square  matrices  of  all  orders.  For 
this  purpose  we  will  find  it  convenient  to  use  subscripted  entries  when  writing  matrices  or  determinants.  Thus,  if  we  denote  a 
2x2  matrix  as 


*11  *12 
*21  *22 


then  the  two  equations  in  1 take  the  form 


det  (A)  = 


*11 

*21 


*12 

*22 


= *11*22  “*12*21 


(3) 


We  define  the  determinant  of  a 1 x 1 matrix  A=  [«n] 
as  det  [A]  = det  [a\\]  =a\\ 


The  following  definition  will  be  key  to  our  goal  of  extending  the  definition  of  a determinant  to  higher  order  matrices. 

r n 


DEFINITION  1 

If  A is  a square  matrix,  then  the  minor  of  entry  aij  is  denoted  by  M y and  is  defined  to  be  the  determinant  of  the 
submatrix  that  remains  after  the  zth  row  and yth  column  are  deleted  from  A.  The  number  ( — 1 ) 2 +*?  My  is  denoted  by 
Cy  and  is  called  the  cofactor  of  entry  aij. 


L 


J 


EXAMPLE  1 Finding  Minors  and  Cofactors 


Let 


A = 


3 1 -4 
2 5 6 

1 4 8 


WARNING 


We  have  followed  the  standard  convention  of 
using  capital  letters  to  denote  minors  and  cofactors 
even  though  they  are  numbers,  not  matrices. 


The  minor  of  entry  ct  \ i is 


A/ll  = 


3 

1 —4 

: 

5 6 

1 

4 8 

5 6 
4 8 


= 16 


The  cofactor  of  a \ i is 
Similarly,  the  minor  of  entry  ^32  is 


Cn  = (-  l)1+1Mn  = Mu  = 16 


A/32  = 


3 

1 

-4 

2 

5 

6 



k 

v 

T 

0 

3 —4 
2 6 


= 26 


The  cofactor  of  «32  is 


C32  = ( - 1 )3+2M32  = - M32  = - 26 


The  term  determinant  was  first  introduced  by  the  German  mathematician  Carl  Friedrich 
Gauss  in  1801  (see  p.  15),  who  used  them  to  “determine”  properties  of  certain  kinds  of  functions. 
Interestingly,  the  term  matrix  is  derived  from  a Latin  word  for  “womb”  because  it  was  viewed  as  a container 
of  determinants. 


The  term  minor  is  apparently  due  to  the  English  mathematician  James  Sylvester  (see  p. 
34),  who  wrote  the  following  in  a paper  published  in  1850:  “Now  conceive  any  one  line  and  any  one  column 
be  struck  out,  we  get. . . a square,  one  term  less  in  breadth  and  depth  than  the  original  square;  and  by  varying 
in  every  possible  selection  of  the  line  and  column  excluded,  we  obtain,  supposing  the  original  square  to 
consist  of  n lines  and  n columns,  ^ such  minor  squares,  each  of  which  will  represent  what  I term  a “First 
Minor  Determinant”  relative  to  the  principal  or  complete  determinant.” 


Note  that  a minor  and  its  corresponding  cofactor  Cy  are  either  the  same  or  negatives  of  each  other  and  that  the 


relating  sign  ( — 1 ) 3 3 is  either  -4=  1 or  — ] in  accordance  with  the  pattern  in  the  “checkerboard”  array 


+ 

+ 


+ 


— 4" 


- + ... 

+ - ... 

— -h  ... 

4=  — ... 


For  example, 


C\\=M\\,  C21  = — M21,  C22  = &f22 

and  so  forth.  Thus,  it  is  never  really  necessary  to  calculate  ( — l)J+?  to  calculate  Cy — you  can  simply  compute  the  minor  M 
and  then  adjust  the  sign  in  accordance  with  the  checkerboard  pattern.  Try  this  in  Example  1 . 


EXAMPLE  2 Cofactor  Expansions  of  a 2 x 2 Matrix 


The  checkerboard  pattern  for  a 2 x 2 matrix  A = [ay  ] is 

4-  — 

- + 

so  that 

Cn  = Mii=fl22  Ci2=  -M 12=  -021 

C21  = — M21  = —012  C22  = M22  = a\\ 


We  leave  it  for  you  to  use  Formula  3 to  verify  that  det(44)  can  be  expressed  in  terms  of  cofactors  in  the  following 
four  ways: 


det(A) 


0 11  012 
*21  022 


= 01lCn  +012^12 
= 02iC2i  -h 022^22 
= 01lCn  4021^21 
= 012^12  + 022^22 


Each  of  last  four  equations  is  called  a cofactor  expansion  of  det[^4] . In  each  cofactor  expansion  the  entries  and 
cofactors  all  come  from  the  same  row  or  same  column  of  A.  For  example,  in  the  first  equation  the  entries  and 
cofactors  all  come  from  the  first  row  of  A,  in  the  second  they  all  come  from  the  second  row  of  A,  in  the  third  they  all 
come  from  the  first  column  of  A , and  in  the  fourth  they  all  come  from  the  second  column  of  A. 


Definition  of  a General  Determinant 

Formula  4 is  a special  case  of  the  following  general  result,  which  we  will  state  without  proof. 


THEOREM  2.1.1 

If  A is  an  n x n matrix,  then  regardless  of  which  row  or  column  of  A is  chosen,  the  number  obtained  by  multiplying  the 
entries  in  that  row  or  column  by  the  corresponding  cofactors  and  adding  the  resulting  products  is  always  the  same. 


This  result  allows  us  to  make  the  following  definition. 


1 


DEFINITION  2 

If  A is  an  n x n matrix,  then  the  number  obtained  by  multiplying  the  entries  in  any  row  or  column  of  A by  the 
corresponding  cofactors  and  adding  the  resulting  products  is  called  the  determinant  of  A,  and  the  sums  themselves  are 
called  cofactor  expansions  of  A.  That  is, 

det(A)  =a\jC\j  -\-ct2jC2j  + ---  + anjCnj 

[cofactor  expansion  along  the  /th  column] 

and 

det(^)  = flfiCji  + ai2^i2  + -••  + air£'in 

(f\ 

[cofactor  expansion  along  the  ith  row] 


EXAMPLE  3 Cofactor  Expansion  Along  the  First  Row 

Find  the  determinant  of  the  matrix 

A = 


by  cofactor  expansion  along  the  first  row. 

Solution 


det(,4)  = 


3 1 

-2  -4 


4 - 


3 1 

-2  -4 


0 
3 

4 -2 


= 3 


-4  3 

4 -2 


- 1 


-2  3 

5 -2 


-2  -4 
5 4 


= 3(— 4)  — (1)(  — 11)4-0  = — 1 


EXAMPLE  4 Cofactor  Expansion  Along  the  First  Column 

Let  A be  the  matrix  in  Example  3,  and  evaluate  det(^4)  by  cofactor  expansion  along  the  first  column  of  A. 

Solution 


det(.d)  = 


3 1 0 

-2  -4  3 
5 4-2 


= 3 


-4  3 

4 -2 


-(-2) 


1 0 
4 -2 


+ 5 


1 0 
-4  3 


= 3(  — 4)  — ( — 2)(  — 2)  4-  5(3)  = — 1 


Note  that  in  Example  4 we  had  to  compute  three 
cofactors,  whereas  in  Example  3 only  two  were 
needed  because  the  third  was  multiplied  by  zero. 
As  a rule,  the  best  strategy  for  cofactor 
expansion  is  to  expand  along  a row  or  column 
with  the  most  zeros. 


This  agrees  with  the  result  obtained  in  Example  3. 


Charles  Lutwidge  Dodgson  (Lewis  Carroll)  (1832-1898) 

Cofactor  expansion  is  not  the  only  method  for  expressing  the  determinant  of  a matrix 
in  terms  of  determinants  of  lower  order.  For  example,  although  it  is  not  well  known,  the  English 
mathematician  Charles  Dodgson,  who  was  the  author  of  Alice's  Adventures  in  Wonderland  and  Through 
the  Looking  Glass  under  the  pen  name  of  Lewis  Carroll,  invented  such  a method,  called  “ condensation 
That  method  has  recently  been  resurrected  from  obscurity  because  of  its  suitability  for  parallel 
processing  on  computers. 

[Image:  Time  & Life  Pictures/Getty  Images,  Inc.] 


EXAMPLES  Smart  Choice  of  Row  or  Column 


If  A is  the  4x4  matrix 


0 0-1 
1 2 2 

0 -2  1 

0 0 1 


then  to  find  det(j4)  it  will  be  easiest  to  use  cofactor  expansion  along  the  second  column,  since  it  has  the  most  zeros: 


det(^)  = 1 


1 0 -1 

1 -2  1 

2 0 1 


For  the  3 x 3 determinant,  it  will  be  easiest  to  use  cofactor  expansion  along  its  second  column,  since  it  has  the  most 
zeros: 


det(.d)  = 


-2(1+2) 

-6 


-1 

1 


EXAMPLE  6 Determinant  of  an  Upper  Triangular  Matrix 


The  following  computation  shows  that  the  determinant  of  a 4 x 4 upper  triangular  matrix  is  the  product  of  its 
diagonal  entries.  Each  part  of  the  computation  uses  a cofactor  expansion  along  the  first  row. 


311 

0 

0 

0 

321 

322 

0 

0 

331 

332 

333 

0 

341 

342 

3 43 

344 

a 22  ^ 


= an 


a22 

342 


a 23 

3 43 


0 

0 

344 


= alla22 


<233 
a 42 


0 

a44 


= 3 1 1^22*33 1«44|  = a 1 1^22^33^44 


The  method  illustrated  in  Example  6 can  be  easily  adapted  to  prove  the  following  general  result. 


THEOREM  2.1.2 

If  .4  is  an  « x n triangular  matrix  (upper  triangular,  lower  triangular,  or  diagonal),  then  det(  A)  is  the  product  of  the 
entries  on  the  main  diagonal  of  the  matrix;  that  is,  det(^4)  = <*n<*22  * ' ' ann- 


A Useful  Technique  for  Evaluating  2x2  and  3^3  Determinants 

Determinants  of  2 x 2 and  3x3  matrices  can  be  evaluated  very  efficiently  using  the  pattern  suggested  in  Figure  2.1.1. 


r 

012  1 

0U 

<Y 

011 

021 

021 

t / 2 2 

L- 

J 

.031 

0-33 

f*U  012 
(t${  (122 

0*1  0*2 


Figure  2.1.1 

In  the  2x2  case,  the  determinant  can  be  computed  by  forming  the  product  of  the  entries  on  the  rightward  arrow  and 
subtracting  the  product  of  the  entries  on  the  leftward  arrow.  In  the  3 x 3 case  we  first  recopy  the  first  and  second  columns  as 
shown  in  the  figure,  after  which  we  can  compute  the  determinant  by  summing  the  products  of  the  entries  on  the  rightward 
arrows  and  subtracting  the  products  on  the  leftward  arrows.  These  procedures  execute  the  computations 

WARNING 


The  arrow  technique  only  works  for  determinants  of 
2 x 2 and  3 x 3 matrices. 


<*11  *12 
321  a22 


= <*  11322  12321 


311 

312 

3 13 

321 

322 

323 

= 011 

331 

332 

333 

322 

332 


323 

333 


-312 


321  323 
331  <333 


+ 313 


321  322 
331  <*32 


= 311  (322333  ~ 323332)  - 312(321333  - <*2333l)  + 313(321332  - <*22a3l) 

= <*11<*22333  +<*12323331  + <*13<*21332  — <*13<*223  31  — a\2^21a23  ~ aUa23a22 


which  agrees  with  the  cofactor  expansions  along  the  first  row. 


EXAMPLE  7 A Technique  for  Evaluating  2x2  and  3x3  Determinants 


(3)( — 2)  — (1  )(4)  = -10 


= [45  + 84  + 96|  - 1 105  - 48  - 72]  = 240 


Concept  Review 

Determinant 

Minor 

Cofactor 

Cofactor  expansion 

Skills 

Find  the  minors  and  cofactors  of  a square  matrix. 

Use  cofactor  expansion  to  evaluate  the  determinant  of  a square  matrix. 

Use  the  arrow  technique  to  evaluate  the  determinant  ofa2x2or3x3  matrix. 

Use  the  determinant  of  a 2 x 2 invertible  matrix  to  find  the  inverse  of  that  matrix. 

Find  the  determinant  of  an  upper  triangular,  lower  triangular,  or  diagonal  matrix  by  inspection. 


Exercise  Set  2.1 


In  Exercises  1-2,  find  all  the  minors  and  cofactors  of  the  matrix  A. 


1. 


A = 


1 

6 

-3 


-2 

7 

1 


3 

-1 

4 


Answer: 


M 1!  = 29,  Cn  = 29 
M\2  = 21,  C\2  = — 21 
Mi3  = 27,  Ci3  = 27 
M2 i=  -11,  C2i  = 1 1 
M22  = 13,  C22  = 13 
M23  = — 5,  C23  = 5 
M3 i=  -19,  C3 1=  -19 
^32  = “ 19,  C32  = 19 
il/33  = 19,  C33  = 19 

2.  ri  1 2' 

A=  3 3 6 
0 1 4 

3.  Let 


Find 

(a)  ^13  andCi3 

(b)  M23  and  C23 

(c)  -^22  and  C22 

(d)  -^21  andC21 

Answer: 

(a)  Mi3  = 0,  Ci3  = 0 

(b)  -^23  = — 96,  C23  = 96 

(c)  A^22=  —48,  C22=  —48 

(d)  M21=72,  C21=  -72 

4.  Let 


Find 

(a)  M32andC32. 

(b)  M44  and  C44  . 
(C)  M41  and  C41 
(d)  M24  and  C24 


4-1  16 

0 0-33 

4 1 0 14 

4 13  2 


3 -1  1 
2 0 3 

-2  1 0 
-2  1 4 


In  Exercises  5-8,  evaluate  the  determinant  of  the  given  matrix.  If  the  matrix  is  invertible,  use  Equation  2 to  find  its  inverse. 


5. 


3 5 
-2  4 


Answer: 


22; 


'_2_  _Jl" 
11  22 

_L  J_ 

11  22  _ 


6.  [4  r 

_8  2 

7.f“5 

7 

1 

1 

<1 

-2 

Answer: 


2_  _J7_1 


59;  5?9 

59 

5 

59 

59 

/2 

4 /3 

In  Exercises  9-14,  use  the  arrow  technique  to  evaluate  the  determinant  of  the  given  matrix. 

9.  \a  - 3 5 

-3  a-  2_ 

Answer: 

a2  -5a  + 21 

10.  [-2  7 6" 

5 1 -2 
3 8 4 

11.  f —2  1 4' 

3 5-7 
6 2_ 

Answer: 

—65 

12. 1"  — 1 1 2' 

3 0-5 
1 7 2 

13.  [3  0 O' 

2-1  5 

1 9 -4 

Answer: 

-123 

14.  [c  -4  3" 

2 1 c 2 

_4  c-1  2 


In  Exercises  15-18,  find  all  values  of  X for  which  det(^4)  = 0. 


15.  h A — 2 1 

-5  A + 4 


Answer: 

A = 1 or  “ 3 

16.  |~A  — 4 0 0 " 

A = 0 A 2 

0 3 A—  1 

17.  ^ _ Fa—  i o 

[ 2 A+l 

Answer: 

A=1  or  -1 

18.  [A-4  4 0 

A=  —1  A 0 

0 0 A — 5 

19.  Evaluate  the  determinant  of  the  matrix  in  Exercise  13  by  a cofactor  expansion  along 

(a)  the  first  row. 

(b)  the  first  column. 

(c)  the  second  row. 

(d)  the  second  column. 

(e)  the  third  row. 

(f)  the  third  column. 

Answer: 

(all  parts)  — 123 

20.  Evaluate  the  determinant  of  the  matrix  in  Exercise  12  by  a cofactor  expansion  along 

(a)  the  first  row. 

(b)  the  first  column. 

(c)  the  second  row. 

(d)  the  second  column. 

(e)  the  third  row. 

(f)  the  third  column. 

In  Exercises  21-26,  evaluate  det(j4)  by  a cofactor  expansion  along  a row  or  column  of  your  choice. 

21.  [-3  0 T 

A=  2 5 1 

_-l  0 5_ 

Answer: 

“40 

22.  [3  3 f 

A=  \ 0 “4 

1 -3  5 


23. 


A = 


1 k k2 
1 k k2 
1 k k2 


Answer: 


24. 


A = 


25. 


A = 


>4-1  Jt-1  T 
2 k — 3 4 
5 £+ 1 k 

'3  3 0 5' 

2 2 0 -2 
4 1-3  0 

2 10  3 2 


Answer: 


-240 


26. 


A = 


0 

3 
2 

4 
2 


1 

-1 

2 

2 

2 


In  Exercises  27-32,  evaluate  the  determinant  of  the  given  matrix  by  inspection. 


27. 


1 0 0 

0-10 
0 0 1 


Answer: 


28. 


-1 

'2  0 0 
0 2 0 
0 0 2 


29. 


0 0 0 0 
12  0 0 
0 4 3 0 
12  3 8 


Answer: 


0 


30. 


1111 
0 2 2 2 
0 0 3 3 
0 0 0 4 


12  7-3 

0 1-4  1 

0 0 2 7 

0 0 0 3 


31. 


Answer: 


6 


-3 

0 

0 

0 

1 

2 

0 

0 

40 

10 

-1 

0 

100 

200 

-23 

3 

33.  Show  that  the  value  of  the  following  determinant  is  independent  of  0. 

sin(0)  cos(0) 

— cos(0)  sin(0) 

sin(0)  — cos(0)  sin(0)  -H  cos(0) 


Answer: 


The  determinant  is  sin20  + cos20  = 1 • 

34.  Show  that  the  matrices 


commute  if  and  only  if 


and  B = 


b a —c 
e d-f 


= 0 


35.  By  inspection,  what  is  the  relationship  between  the  following  determinants? 


a 

b 

c 

a A-  A 

b 

c 

^1  = 

d 

1 

/ and  d2  = 

d 

1 

/ 

g 

0 

1 

g 

0 

1 

Answer: 

d2  = d\  + A 

36.  Show  that 

tr  (A)  1 

\x(A2)  tr(j4) 

for  every  2x2  matrix  A. 

37.  What  can  you  say  about  an  wth-order  determinant  all  of  whose  entries  are  1?  Explain  your  reasoning. 

38.  What  is  the  maximum  number  of  zeros  that  a 3 x 3 matrix  can  have  without  having  a zero  determinant?  Explain  your 
reasoning. 

39.  What  is  the  maximum  number  of  zeros  that  a 4 x 4 matrix  can  have  without  having  a zero  determinant?  Explain  your 
reasoning. 

40.  Prove  that  (x\,  y\),  (*2>  72)’  anc^  (*3>  73)  are  c°Umear  points  if  and  only  if 

*i  y\  1 
*2  72  1 =0 
x2  73  1 

41.  Prove  that  the  equation  of  the  line  through  the  distinct  points  (a\,b\)  and  (^2,  ^2)  can  be  written  as 


X 


y 

ai  b\ 

<*2  k>2 


1 

1 

1 


= 0 


42.  Prove  that  if  .4  is  upper  triangular  and  Bjj  is  the  matrix  that  results  when  the  /th  row  and  /th  column  of  A are  deleted,  then 
Bjj  is  upper  triangular  if  i < j. 

True-False  Exercises 


In  parts  (a)-(i)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 


(a) 

v ’ The  determinant  of  the  2x2  matrix 


a b 
c d 


is  ad  + 


Answer: 

False 

(b)  Two  square  matrices  A and  B can  have  the  same  determinant  only  if  they  are  the  same  size. 
Answer: 

False 

(c)  The  minor  is  the  same  as  the  cofactor  if  and  only  if  j -F  j is  even. 

Answer: 

True 

(d)  If  A is  a 3 x 3 symmetric  matrix,  then  Cjj  - Cji  for  all  i and  j. 

Answer: 


True 

(e)  The  value  of  a cofactor  expansion  of  a matrix  A is  independent  of  the  row  or  column  chosen  for  the  expansion. 
Answer: 


True 

(f)  The  determinant  of  a lower  triangular  matrix  is  the  sum  of  the  entries  along  its  main  diagonal. 
Answer: 

False 

(g)  For  every  square  matrix  A and  every  scalar  c,  we  have  det(c^)  = c det(^4) . 

Answer: 

False 

(h)  For  all  square  matrices  A and  B , we  have  det(^4  4-  B)  = det(.d)  + det(5) . 

Answer: 


False 


(i)  For  every  2x2  matrix^,  we  have  det (A  ) = (det(^4)) 

Answer: 

True 
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2.2  Evaluating  Determinants  by  Row  Reduction 

In  this  section  we  will  show  how  to  evaluate  a determinant  by  reducing  the  associated  matrix  to  row  echelon  form.  In 
general,  this  method  requires  less  computation  than  cofactor  expansion  and  hence  is  the  method  of  choice  for  large 
matrices. 


A Basic  Theorem 

We  begin  with  a fundamental  theorem  that  will  lead  us  to  an  efficient  procedure  for  evaluating  the  determinant  of  a square 
matrix  of  any  size. 


THEOREM  2.2.1 

Let  A be  a square  matrix.  If  A has  a row  of  zeros  or  a column  of  zeros,  then  det(2d)  = 0. 


Since  the  determinant  of  A can  be  found  by  a cofactor  expansion  along  any  row  or  column,  we  can  use  the  row  or 
column  of  zeros.  Thus,  if  we  let  Cj,  C2, C}}  denote  the  cofactors  of  A along  that  row  or  column,  then  it  follows  from 
Formula  5 or  6 in  Section  2.1  that 


det(A)  = 0 ■ Ci  + 0 • C2  + ...  + 0 • Cn  = 0 


The  following  useful  theorem  relates  the  determinant  of  a matrix  and  the  determinant  of  its  transpose. 


THEOREM  2.2.2 

T 

Let  A be  a square  matrix.  Then  det(^4)  = det(L4  ). 


Because  transposing  a matrix  changes  its  columns  to 
rows  and  its  rows  to  columns,  almost  every  theorem 
about  the  rows  of  a determinant  has  a companion 
version  about  columns,  and  vice  versa. 


Since  transposing  a matrix  changes  its  columns  to  rows  and  its  rows  to  columns,  the  cofactor  expansion  of  A 

T 

along  any  row  is  the  same  as  the  cofactor  expansion  of  A along  the  corresponding  column.  Thus,  both  have  the  same 
determinant. 


Elementary  Row  Operations 


The  next  theorem  shows  how  an  elementary  row  operation  on  a square  matrix  affects  the  value  of  its  determinant.  In 


place  of  a formal  proof  we  have  provided  a table  to  illustrate  the  ideas  in  the  3 x 3 case  (see  Table  1). 


THEOREM  2.2.3 

Let  A be  an  ^ x n matrix. 

(a)  If  B is  the  matrix  that  results  when  a single  row  or  single  column  of  A is  multiplied  by  a scalar  k , then 
det  (5)  =k  det(^4). 

(b)  If  B is  the  matrix  that  results  when  two  rows  or  two  columns  of  A are  interchanged,  then  det  (5)  = — det(^4) . 

(c)  If  B is  the  matrix  that  results  when  a multiple  of  one  row  of  A is  added  to  another  row  or  when  a multiple  of 
one  column  is  added  to  another  column,  then  det  (5)  = det(^4). 


The  first  panel  of  Table  1 shows  that  you  can  bring  a 
common  factor  from  any  row  (column)  of  a 
determinant  through  the  determinant  sign.  This  is  a 
slightly  different  way  of  thinking  about  part  (a)  of 
Theorem  2.2.3. 


Table  1 


Relationship 

Operation 

ka  1 1 ko  j 2 ka  | ^ 

#21  #22  <*23  = k 

<*3\  <*32  <*33 

det  (B)  — Adetl 

«ll  «I2  0|3 

aa  an  an 

<*31  <*32  ^33 

[4) 

The  first  row  of  A is 
multiplied  by  A. 

<*2\  <*22  <*23  <*\\  a\2  <*\3 

#H  <*\2  <*\3  — — <*2\  <*22  <*23 

aM  a32  a33  a3l  a32  a33 

det  (B)  = -det(4) 

The  first  and  second  rows 
of  A are  interchanged. 

# | j “1“  k(J  ■jj  <*  12  d”  k(I  ->2  <*  [3  d~  kll  -j  ^ 
<*2\  <*22  <*23 

<*3\  <*32  <*33 

det(tf)  = det( 

4) 

#11  <*\2  <*13 

<*2\  <*22  <*23 

<*3i  <*32  <*33 

A multiple  of  the  second 
row  of  A is  added  to  the 
first  row'. 

We  will  verify  the  first  equation  in  Table  1 and  leave  the  other  two  for  you.  To  start,  note  that  the  determinants  on  the  two 
sides  of  the  equation  differ  only  in  the  first  row,  so  these  determinants  have  the  same  cofactors,  C\\,C\2,Cy$,  along  that 
row  (since  those  cofactors  depend  only  on  the  entries  in  the  second  two  rows).  Thus,  expanding  the  left  side  by  cofactors 
along  the  first  row  yields 


kan 

*<*12 

*<*13 

a21 

«22 

<*23 

a 31 

«32 

<*33 

= foj  1 1 C 1 1 4-  ^12^12  + *<*33^13 


= *(ailCn  +<ati2Ci2  + <233Ci3) 


<211 

<*12 

<*13 

a2\ 

<*22 

<*23 

a2\ 

<*32 

<*33 

Elementary  Matrices 

It  will  be  useful  to  consider  the  special  case  of  Theorem  2.2.3  in  which  A = In  is  the  n x n identity  matrix  and  E (rather 
than  B ) denotes  the  elementary  matrix  that  results  when  the  row  operation  is  performed  on  ln.  In  this  special  case 
Theorem  2.2.3  implies  the  following  result. 


THEOREM  2.2.4 

Let  E be  an  n x n elementary  matrix. 

(a)  If  E results  from  multiplying  a row  of  by  a nonzero  number  k , then  det^)  = k. 

(b)  If  E results  from  interchanging  two  rows  of  then  det(£)  = — 1 . 

(c)  If  E results  from  adding  a multiple  of  one  row  of  to  another,  then  det(£)  = 1 . 


EXAMPLE  1 Determinants  of  Elementary  Matrices 

The  following  determinants  of  elementary  matrices,  which  are  evaluated  by  inspection,  illustrate  Theorem 
2.2.4. 


Observe  that  the  determinant  of  an  elementary 
matrix  cannot  be  zero. 


1 

0 

0 

0 

0 

0 

0 

1 

1 

0 

0 

7 

0 

3 

0 

0 

= 3, 

0 

1 

0 

0 

= — 1, 

0 

1 

0 

0 

0 

0 

1 

0 

0 

0 

1 

0 

0 

0 

1 

0 

0 

0 

0 

1 

1 

0 

0 

0 

0 

0 

0 

1 

The  second  row  of  1 4 The  first  and  last  iws  of  7 times  the  last  row  of  1 4 


was  multiplied  by  3 T 4 interchanged.  was  added  to  the  first  row. 


Matrices  with  Proportional  Rows  or  Columns 


If  a square  matrix  A has  two  proportional  rows,  then  a row  of  zeros  can  be  introduced  by  adding  a suitable  multiple  of  one 


of  the  rows  to  the  other.  Similarly  for  columns.  But  adding  a multiple  of  one  row  or  column  to  another  does  not  change 
the  determinant,  so  from  Theorem  2.2.1,  we  must  have  det(^4)  = 0.  This  proves  the  following  theorem. 


THEOREM  2.2.5 

If  A is  a square  matrix  with  two  proportional  rows  or  two  proportional  columns,  then  det(^4)  = 0. 


EXAMPLE  2 Introducing  Zero  Rows 


The  following  computation  shows  how  to  introduce  a row  of  zeros  when  there  are  two  proportional  rows. 


1 

3 

-2 

4 

1 

3 

-2 

4 

2 

6 

-4 

8 

0 

0 

0 

0 

3 

9 

1 

5 

3 

9 

1 

5 

1 

1 

4 

8 

1 

1 

4 

8 

The  second  row  is  2 times  the 
first,  so  we  added  = 2 times 

the  first  row  to  the  second  to 
introduce  a row  of  zeros  . 


Each  of  the  following  matrices  has  two  proportional  rows  or  columns;  thus,  each  has  a determinant  of  zero. 


-1  4 

-2  8 ’ 


1 

-4 

2 


-2 

8 

-4 


7 

5 , 
3 


3-1  4-5 

6-2  5 2 

5 8 14 

-9  3 -12  15 


Evaluating  Determinants  by  Row  Reduction 


We  will  now  give  a method  for  evaluating  determinants  that  involves  substantially  less  computation  than  cofactor 
expansion.  The  idea  of  the  method  is  to  reduce  the  given  matrix  to  upper  triangular  form  by  elementary  row  operations, 
then  compute  the  determinant  of  the  upper  triangular  matrix  (an  easy  computation),  and  then  relate  that  determinant  to 
that  of  the  original  matrix.  Here  is  an  example. 


EXAMPLE  3 Using  Row  Reduction  to  Evaluate  a Determinant 


Evaluate  det(^4)  where 


1 5 
-6  9 
6 1 


We  will  reduce  A to  row  echelon  form  (which  is  upper  triangular)  and  then  apply  Theorem 

2.1.2. 


Even  with  today’s  fastest  computers  it  would 
take  millions  of  years  to  calculate  a 25  x 25 
determinant  by  cofactor  expansion,  so 


methods  based  on  row  reduction  are  often 
used  for  large  determinants.  For  determinants 
of  small  size  (such  as  those  in  this  text), 
cofactor  expansion  is  often  a reasonable 
choice. 


det(;l)  = 


0 1 5 

3-6  9 
2 6 1 


3-6  9 
0 1 5 

2 6 1 


-2  3 
1 5 
6 1 


= -3 


1 -2  3 

0 1 5 

0 10  -5 


= -3 


1 -2  3 

0 1 5 

0 0 -55 


= (-3)(-55) 


1 -2  3 
0 1 5 

0 0 1 


= ( — 3)  ( — 55)  (1)  = 165 


The  first  and  second  rows  of 
A where  interchanged . 

A common  factor  of  3 from 
«=  the  first  row  was  taken 

through  the  determinant  sign . 

= 2 times  the  first  row  was 
added  to  the  third  row . 


= =10  times  the  second  row 
was  added  to  the  third  row . 

A common  factor  of  = 55 
«—  from  the  last  row  was  taken 
through  the  determinant  sign . 


EXAMPLE  4 Using  Column  Operations  to  Evaluate  a Determinant 


Compute  the  determinant  of 


10  0 3 

2 7 0 6 

0 6 3 0 

7 3 1-5 


This  determinant  could  be  computed  as  above  by  using  elementary  row  operations  to  reduce  A to 
row  echelon  form,  but  we  can  put  A in  lower  triangular  form  in  one  step  by  adding  -3  times  the  first  column 
to  the  fourth  to  obtain 


det(^4)  = det 


1 0 0 
2 7 0 
0 6 3 
7 3 1 


0 

0 

0 


-26 


= (1)  (7)  (3)  ( — 26)  = — 546 


Example  4 points  out  that  it  is  always  wise  to  keep 
an  eye  open  for  column  operations  that  can  shorten 


computations. 


Cofactor  expansion  and  row  or  column  operations  can  sometimes  be  used  in  combination  to  provide  an  effective  method 
for  evaluating  determinants.  The  following  example  illustrates  this  idea. 

EXAMPLE  5 Row  Operations  and  Cofactor  Expansion 


Evaluate  det(yl)  where 


A = 


3 5 
1 2 

2 4 

3 7 


-2  6 

-1  1 
1 5 
5 3 


By  adding  suitable  multiples  of  the  second  row  to  the  remaining  rows,  we  obtain 


det(A) 


1 3 

-1  1 

3 3 
8 0 

1 3 
3 3 
8 0 

1 3 
3 3 
9 3 


3 

3 


= -18 


<—  Cofactor  expansion  along  the  first  column . 


4—  We  added  the  first  row  to  the  third  row . 


o—  Cofactor  expansion  along  the  first  column . 


Skills 

Know  the  effect  of  elementary  row  operations  on  the  value  of  a determinant. 

Know  the  determinants  of  the  three  types  of  elementary  matrices. 

Know  how  to  introduce  zeros  into  the  rows  or  columns  of  a matrix  to  facilitate  the  evaluation  of  its  determinant. 
Use  row  reduction  to  evaluate  the  determinant  of  a matrix. 

Use  column  operations  to  evaluate  the  determinant  of  a matrix. 

Combine  the  use  of  row  reduction  and  cofactor  expansion  to  evaluate  the  determinant  of  a matrix. 


Exercise  Set  2.2 

In  Exercises  1-4,  verify  that  det(.d)  = det(.dJ ). 


A=  1 2 4 

_5  -3  6_ 

4.  [4  2-1' 

A=  0 2-3 

-1  1 5 

In  Exercises  5-9,  find  the  determinant  of  the  given  elementary  matrix  by  inspection. 

5.  T 1 0 0 O' 

0 1 0 0 

0 0-50 
0 0 0 1 

Answer: 


-5 


1 

0 

o' 

0 

1 

0 

- 

■5 

0 

1 

1 

0 

0 

0 

0 

0 

1 

0 

0 

1 

0 

0 

0 

0 

0 

1 

Answer: 

-1 

8. 1"  1 0 0 o’ 

0-^00 

0 0 10 
0 0 0 1 

9.  [l  0 0 O’ 

0 10-9 

0 0 1 0 

0 0 0 1 

Answer: 

1 

In  Exercises  10-17,  evaluate  the  determinant  of  the  given  matrix  by  reducing  the  matrix  to  row  echelon  form. 

10.  T 3 6 -9' 

0 0-2 

-2  1 5 

11. ro  3 r 

1 1 2 

3 2 4 


Answer: 


5 

12.  r -3  O' 

-2  4 1 

-2  2_ 

13.  f -6  9' 

-2  7 -2 

1 5 

Answer: 

33 

14.  [1—2  3 f 

5-963 

-1  2 -6  -2 

2 8 6 1 

15.  [2  1 3 f 

10  11 
0 2 10 
0 12  3 

Answer: 

6 

16.  r oi  i i 


-if  0 0 

17.  r i 31 

-2  -7  0 
0 1 
0 2 
0 0 

Answer: 

-2 

18.  Repeat  Exercises  10-13  by  using  a combination  of  row  reduction  and  cofactor  expansion. 

19.  Repeat  Exercises  14-17  by  using  a combination  of  row  operations  and  cofactor  expansion. 

Answer: 

Exercise  14:  39;  Exercise  15:  6;  Exercise  16:  — -i;  Exercise  17:  —2 


5 3 
-4  2 
0 1 
1 1 
1 1 


In  Exercises  20-27,  evaluate  the  determinant,  given  that 


a b c 

d e / = _6 
g h i 


20.  g h i 

d e f 
a be 

21 . d e f 
g h i 
a be 

Answer: 

-6 

22.  a b c 
d e j 

2 a 2b  2c 

23.  3 a 3b  3c 
—d  —e  —f 
4 g Ah  4 i 

Answer: 

72 

24.  a + d b + e c + / 

-d  -e  -f 

g h i 

25.  a A-  g b + h c + i 

d e f 

g h i 

Answer: 

-6 

26.  a b c 

2d  2e  2/ 

g + 3 a h 4-  3b  i + 3c 

27.  —3a  —3b  —3c 

d e f 

g — Ad  h — Ae  i — Af 

Answer: 

18 


28.  Show  that 


(a) 


det 


(b) 


det 


0 0 a 13 

0 aji  (223 
<3 31  <332  «33 

0 0 0 a i4 

0 0 ct22  <324 

0 a 22  <333  <334 
<341  <342  <343  <344 


= - <J13<322<331 


= <J14<323<332<341 


29. 


Use  row  reduction  to  show  that 


1 

b 


= (b  — a)(c  — a)  ( c — b) 

In  Exercises  30-33,  confirm  the  identities  without  evaluating  the  determinants  directly. 


J.  k2  2 
a o c 


30. 

CL\^b\t  d2^rb2t  <33  + 

<31  <32  <33 

a\t^~b\  a2ts¥b2  a^t^b^ 

= (i -t2) 

b\  b2  63 

c\  c2  C2 

c\  C2  C2 

31. 

<31  b\  <*i  +b\  -fci 

<J1  i>i  ci 

<32  b2  <32  + ^2  + c2 

= 

<32  b2  C2 

<J3  63  <22  + b2  + C2 

<22  Z?3  C2 

32. 

a\  b\^-ta\  c\+rb\+sa\ 

<31  <32  <*3 

22  b2  + ta2  C2  + ^2  + sa2 

= 

b\  i>2  b2 

23  &3  + &Z3  C2  + rb2^-S22 

ci  C2  C3 

33. 

2\  + b\  2\  — b\  C\ 

2\  b\  C 1 

22  + ^2  a2  “ &2  c2 

= -2 

<22  &2  ^2 

tf3  + &3  ^3“&3  c 2 

<33  63  C2 

34.  Find  the  determinant  of  the  following  matrix. 

a b b b 
b a b b 
b b a b 
b b b a 


In  Exercises  35-36,  show  that  det(v4)  = 0 without  directly  evaluating  the  determinant. 


35. 


A = 


36. 


A = 


-2814 
3 2 5 1 

1 10  6 5 

4-64-3 

-41111 

I- 4111 

II- 411 

III- 41 
1111-4 


True-False  Exercises 


In  parts  (a)-(f)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 


(a)  If  A is  a 4 x 4 matrix  and  B is  obtained  from  A by  interchanging  the  first  two  rows  and  then  interchanging  the  last  two 
rows,  then  det(5)  = det(^4). 

Answer: 

True 

(b)  If  A is  a 3 x 3 matrix  and  B is  obtained  from  A by  multiplying  the  first  column  by  4 and  multiplying  the  third  column 
by  J-,  then  det(5)  = 3 det(^4). 

Answer: 

True 

(c)  If  A is  a 3 x 3 matrix  and  B is  obtained  from  A by  adding  5 times  the  first  row  to  each  of  the  second  and  third  rows, 
then  det(5)  = 25  det(^4). 

Answer: 

False 

(d)  If  A is  an  n x n matrix  and  B is  obtained  from  A by  multiplying  each  row  of  A by  its  row  number,  then 

det(S)  = det(^) 


Answer: 

False 

(e)  If  A is  a square  matrix  with  two  identical  columns,  then  det(^4)  = 0. 

Answer: 

True 

(f)  If  the  sum  of  the  second  and  fourth  row  vectors  of  a 6 x 6 matrix  A is  equal  to  the  last  row  vector,  then  det(^4)  = 0. 
Answer: 

True 
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2.3  Properties  of  Determinants;  Cramer's  Rule 

In  this  section  we  will  develop  some  fundamental  properties  of  matrices,  and  we  will  use  these  results  to  derive  a 
formula  for  the  inverse  of  an  invertible  matrix  and  formulas  for  the  solutions  of  certain  kinds  of  linear  systems. 


Basic  Properties  of  Determinants 


Suppose  that  A and  B are  n x n matrices  and  k is  any  scalar.  We  begin  by  considering  possible  relationships 
between  det(^4),  det  (5),  and 

det(^),  det(.4  + 5),  and  det  (AB) 

Since  a common  factor  of  any  row  of  a matrix  can  be  moved  through  the  determinant  sign,  and  since  each  of  the 
n rows  in  kA  has  a common  factor  of  k , it  follows  that 


For  example, 


det  (£.4)  =£”det(-d) 


kan 

ka2i 

ka3i 


kct\2  &213 
kct22  kaji 
ka^2  kct-ft 


= k: 


an 

a21 

<*31 


al2 

a 13 

a 22 

<*23 

« 32 

<*  33 

(1) 


Unfortunately,  no  simple  relationship  exists  among  det(j4),  det (5).  and  det(-d  ■+  B).  In  particular,  we  emphasize 
that  det(j4  -(-  B)  will  usually  not  be  equal  to  det(zl)  + det(5).  The  following  example  illustrates  this  fact. 

EXAMPLE  1 det  {A  + B)  * det(A)  + det(B) 


Consider 


1 

3 ’ 


A + B = 


4 

3 


3 

8 


We  have  det (^4)  = 1,  det (B)  = 8,  and  det(v4  + B)  = 23;  thus 

det(j4  4-  B)  * det(j4)  + det  (5) 


In  spite  of  the  previous  example,  there  is  a useful  relationship  concerning  sums  of  determinants  that  is  applicable 
when  the  matrices  involved  are  the  same  except  for  one  row  (column).  For  example,  consider  the  following  two 
matrices  that  differ  only  in  the  second  row: 


<*11  <*12 
<*21  <*22 


and 


•an  <*12 
*21  *22 


Calculating  the  determinants  of  A and  B we  obtain 


det(^4)  + det(5)  = 011^22  “<*12<*2l)  + (<*11*22  ~ <*12*21 ) 
= <*11  (<*22  + *22)  “<*12(<*21  + *2l) 

-dt"  aU  ai2 

^21  ^21  <*22 +*22 


Thus 


det 


<*11 

<*21 


<*12 

<*22 


4-  det 


<*11  <*12 
*21  *22 


= det 


<*11 

<*21  +*21 


<*12 

<*22  +*22 


This  is  a special  case  of  the  following  general  result. 


THEOREM  2.3.1 

Let  A,  B , and  C be  ^ x n matrices  that  differ  only  in  a single  row,  say  the  rth,  and  assume  that  the  rth  row 
of  C can  be  obtained  by  adding  corresponding  entries  in  the  rth  rows  of  A and  B.  Then 

det(C)  = det(A)  + det(5) 

The  same  result  holds  for  columns. 


EXAMPLE  2 Sums  of  Determinants 

We  leave  it  to  you  to  confirm  the  following  equality  by  evaluating  the  determinants. 


17  5 

'1  7 5' 

'1  7 5' 

2 0 3 

= det 

2 0 3 

+ det 

2 0 3 

1 

+ 

0 

4^ 

+ 

<1 

+ 

1 

V f 

1 

1 4 7 

0 1 -1 

Determinant  of  a Matrix  Product 

Considering  the  complexity  of  the  formulas  for  determinants  and  matrix  multiplication,  it  would  seem  unlikely 
that  a simple  relationship  should  exist  between  them.  This  is  what  makes  the  simplicity  of  our  next  result  so 
surprising.  We  will  show  that  if  A and  B are  square  matrices  of  the  same  size,  then 

det(A£?)  = det  (^4)  det  (5)  (2) 

The  proof  of  this  theorem  is  fairly  intricate,  so  we  will  have  to  develop  some  preliminary  results  first.  We  begin 
with  the  special  case  of  2 in  which  A is  an  elementary  matrix.  Because  this  special  case  is  only  a prelude  to  2,  we 
call  it  a lemma. 


LEMMA  2.3.2 


If  B is  an  ^ x n matrix  and  E is  an  ^ x n elementary  matrix,  then 

det  (EB)  = det(£)  det  (5) 


We  will  consider  three  cases,  each  in  accordance  with  the  row  operation  that  produces  the  matrix  E. 

If  E results  from  multiplying  a row  of  ln  by  k , then  by  Theorem  1.5.1,  EB  results  from  B by  multiplying 
the  corresponding  row  by  k\  so  from  Theorem  2.2.3(a)  we  have 

det  (EB)  = k det  (B) 

But  from  Theorem  2.2.4(a)  we  have  det(£)  = k , so 

det  (EB)  = det  (E)  det  (B) 

The  proofs  of  the  cases  where  E results  from  interchanging  two  rows  of  ln  or  from  adding  a 
multiple  of  one  row  to  another  follow  the  same  pattern  as  Case  1 and  are  left  as  exercises. 

It  follows  by  repeated  applications  of  Lemma  2.3.2  that  if  B is  an  n x n matrix  and  E\ , E2, Er  are 
n x n elementary  matrices,  then 


det(E\E2--J2rB)  = det(2?i)  det(S2)...det(£r)det(5) 


(3) 


Determinant  Test  for  Invertibility 

Our  next  theorem  provides  an  important  criterion  for  determining  whether  a matrix  is  invertible.  It  also  takes  us  a 
step  closer  to  establishing  Formula  2. 


THEOREM  2.3.3 

A square  matrix  A is  invertible  if  and  only  if  det(-d)  * 0. 


Let  R be  the  reduced  row  echelon  form  of  A.  As  a preliminary  step,  we  will  show  that  det(^4)  and  det(£) 
are  both  zero  or  both  nonzero:  Let  E\,  E 2, Er  be  the  elementary  matrices  that  correspond  to  the  elementary 
row  operations  that  produce  R from  A.  Thus 


and  from  3, 


R = Er  • • • E2E\A 


det (R)  = det (Er)  • • • det(£2)  det(i?i)  det(^4) 


(4) 


We  pointed  out  in  the  margin  note  that  accompanies  Theorem  2.2.4  that  the  determinant  of  an  elementary  matrix 
is  nonzero.  Thus,  it  follows  from  Formula  4 that  det(^4)  and  det(R)  are  either  both  zero  or  both  nonzero,  which 
sets  the  stage  for  the  main  part  of  the  proof.  If  we  assume  first  that  A is  invertible,  then  it  follows  from  Theorem 
1.6.4  that  R = [ and  hence  that  dzt(R)  = 1 ( * 0).  This,  in  turn,  implies  that  det(y4)  * 0,  which  is  what  we 
wanted  to  show. 


It  follows  from  Theorems  2.3.3  and  Theorem 
2.2.5  that  a square  matrix  with  two  proportional 
rows  or  two  proportional  columns  is  not 
invertible. 


Conversely,  assume  that  det  (A)  ± 0.  It  follows  from  this  that  det  (R)  * 0,  which  tells  us  that  R cannot  have  a row 
of  zeros.  Thus,  it  follows  from  Theorem  1 .4.3  that  R = / and  hence  that  A is  invertible  by  Theorem  1 .6.4. 

EXAMPLE  3 Determinant  Test  for  Invertibility 


Since  the  first  and  third  rows  of 


A = 


1 2 
1 0 
2 4 


3 

1 

6 


are  proportional,  det(A)  = 0.  Thus  A is  not  invertible. 


We  are  now  ready  for  the  main  result  concerning  products  of  matrices. 


THEOREM  2.3.4 

If  A and  B are  square  matrices  of  the  same  size,  then 

det  (AS)  = det  (A)  det  (B) 


We  divide  the  proof  into  two  cases  that  depend  on  whether  or  not  A is  invertible.  If  the  matrix  A is  not 
invertible,  then  by  Theorem  1.6.5  neither  is  the  product  AB.  Thus,  from  Theorem  Theorem  2.3.3,  we  have 
det(AS)  = 0 and  det  (21)  = 0,  so  it  follows  that  det  (AS)  = det  (A)  det  (5). 


Augustin  Louis  Cauchy  (1789-1857) 

In  1815  the  great  French  mathematician  Augustin  Cauchy  published  a landmark  paper 
in  which  he  gave  the  first  systematic  and  modem  treatment  of  determinants.  It  was  in  that  paper  that 
Theorem  2.3.4  was  stated  and  proved  in  full  generality  for  the  first  time.  Special  cases  of  the  theorem  had 
been  stated  and  proved  earlier,  but  it  was  Cauchy  who  made  the  final  jump. 
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Now  assume  that  A is  invertible.  By  Theorem  1.6.4,  the  matrix  A is  expressible  as  a product  of  elementary 
matrices,  say 


A = E\E2  ■ • ■ Er  (5) 

so 

AB  = E\E2  • • ■ ErB 

Applying  3 to  this  equation  yields 

det (AB)  = det(£i)det(i?2)  * ' “ det(£V)det(5) 

and  applying  3 again  yields 

det(AS)  = det(i?ii?2  " ■ ■ Er)det(B) 

which,  from  5,  can  be  written  as  det  (AS)  = det(A)det(S). 

EXAMPLE  4 Verifying  That  det  (AB)  = det(A),  det(B) 

Consider  the  matrices 

A = 

We  leave  it  for  you  to  verify  that 

det  (A)  = 1,  det  (5)  = - 23,  and  det  (AS)  = - 23 

Thus  det  (AS)  = det  (A)  det  (S),  as  guaranteed  by  Theorem  2.3.4. 


B = 


-1  3 
5 8 


AB  = 


2 17 

3 14 


The  following  theorem  gives  a useful  relationship  between  the  determinant  of  an  invertible  matrix  and  the 
determinant  of  its  inverse. 


THEOREM  2.3.5 


If  A is  invertible,  then 


det(y4-1)  = 


1 

det(^) 


Since  A = L it  follows  that  det(j4  ^ A)  = det(7) . Therefore,  we  must  have  det(A  1 ) det(.d)  = 1 . 
Since  det(-4)  * 0,  the  proof  can  be  completed  by  dividing  through  by  det(-d) . 


Adjoint  of  a Matrix 

In  a cofactor  expansion  we  compute  det(^4)  by  multiplying  the  entries  in  a row  or  column  by  their  cofactors  and 
adding  the  resulting  products.  It  turns  out  that  if  one  multiplies  the  entries  in  any  row  by  the  corresponding 
cofactors  from  a different  row,  the  sum  of  these  products  is  always  zero.  (This  result  also  holds  for  columns.) 
Although  we  omit  the  general  proof,  the  next  example  illustrates  the  idea  of  the  proof  in  a special  case. 


It  follows  from  Theorems  2.3.5  and  2.1.2  that 


det(^_1)  = 


1 1 


1 


«11  «22  ««« 
Moreover,  by  using  the  adjoint  formula  it  is 
possible  to  show  that 


_L  J_  _L_ 

*11'  a22  ’ ’ “yin 

are  actually  the  successive  diagonal  entries  of 
A (compare  A and  A in  Example  3 of 


Section  1.7 ). 


EXAMPLE  5 Entries  and  Cofactors  from  Different  Rows 


Let 


an 

<*12 

a 13 

«21 

<^22 

<*23 

a 31 

«32 

« 33 

Consider  the  quantity 


^ 11^*31  +£12^32  +<213^33 


that  is  formed  by  multiplying  the  entries  in  the  first  row  by  the  cofactors  of  the  corresponding  entries 
in  the  third  row  and  adding  the  resulting  products.  We  can  show  that  this  quantity  is  equal  to  zero  by 
the  following  trick:  Construct  a new  matrix  Af  by  replacing  the  third  row  of  A with  another  copy  of  the 
first  row.  That  is, 


A'  = 


an  a12  <*13 
<*2\  &22  «23 
an  an  ai2 


Let  C5i.C32.Cb  be  the  cofactors  of  the  entries  in  the  third  row  of  Ar  ■ Since  the  first  two  rows  of  A 
and  Ar  are  the  same,  and  since  the  computations  of  C31,  C32,  C33,  ^31^32  , and  C33  involve  only 
entries  from  the  first  two  rows  of  A and  A*,  it  follows  that 


C31  = C^,  C32  = C'2,  C33  = C33 

Since  A'  has  two  identical  rows,  it  follows  from  3 that 


detail')  = 0 


On  the  other  hand,  evaluating  det^’)  by  cofactor  expansion  along  the  third  row  gives 


det(j4')  =ai\C'2i  +^12^32  + a 13^33  =a  11C31  +« 12^32  + a 13^33 


(6) 


(7) 


From  6 and  7 we  obtain 

^ll^l  + a 12^32  + a 13^33  = 0 


DEFINITION  1 

If  A is  any  nxn  matrix  and  Cjj  is  the  cofactor  of  aij.  then  the  matrix 

~cn  cn  ...  cin 
c2i  c22  ...  c2„ 

Cn  1 Cn2  ...  Cnn 

is  called  the  matrix  of  cofactors  from  A.  The  transpose  of  this  matrix  is  called  the  adjoint  of  A and  is 
denoted  by  adj(A). 


EXAMPLE  6 Adjoint  of  a 3 x 3 Matrix 


Let 


The  cofactors  of  A are 


A = 


3 

1 

2 


2 

6 

-4 


-1 

3 

0 


Cu  = 12  Ci2  = 6 Ci3=  -16 

C2i  =4  C22  = 2 C22  = 16 

C31  = 12  C32=-10  C33  = 16 


so  the  matrix  of  cofactors  is 


and  the  adjoint  of  A is 


12 

6 

-16 

4 

2 

16 

12 

-10 

16 

12 

4 

12 

i = 

6 

2 ■ 

-10 

-16 

16 

16 

I \ 


Leonard  Eugene  Dickson  (1874-1954) 

The  use  of  the  term  adjoint  for  the  transpose  of  the  matrix  of  cofactors  appears  to  have 
been  introduced  by  the  American  mathematician  L.  E.  Dickson  in  a research  paper  that  he  published  in 
1902. 
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In  Theorem  1.4.5  we  gave  a formula  for  the  inverse  of  a 2 x 2 invertible  matrix.  Our  next  theorem  extends  that 
result  to  n x n invertible  matrices. 


Inverse  of  a Matrix  Using  Its  Adjoint 


If  A is  an  invertible  matrix,  then 


A-1 


1 

det(^) 


adj(^) 


(8) 


We  show  first  that 


A adj  (A)  = det(^4)/ 


Consider  the  product 


A adj  ( A)  = 


011 

012  • 

••  0\n 

021 

022  • 

• • “In 

0/1 

0/2  • 

• • Oin 

Jh  1 

0*2  • 

C„  C21  ... 

Cp  C22 


Cj 1 


CJ2 


c,„  ...  c 


..  C„,' 

■ • C„2 

..  c„„ 


The  entry  in  the  z'th  row  and  /th  column  of  the  product  A adj(zl)  is 

ai  1 Cj  1 + a:2^]2  + - - - + GjnCjn 

(see  the  shaded  lines  above). 


(9) 


If  i = j,  then  9 is  the  cofactor  expansion  of  det(A)  along  the  z'th  row  of  A (Theorem  2.1 . 1),  and  if  i * j,  then  the 
a's  and  the  cofactors  come  from  different  rows  of.l.  so  the  value  of  9 is  zero.  Therefore, 


A adj  (A)  = 


0 0 ...  det04) 

Since  A is  invertible,  det(zl)  * 0.  Therefore,  Equation  10  can  be  rewritten  as 


det(A)  0 ...  0 

0 det(j4)  ...  0 


= det04)/ 


(10) 


1 


det(j4) 

Multiplying  both  sides  on  the  left  by  A -1  yields 


[Aadj04)]=/  or  A 


1 


det(A) 


■adj  04) 


= 1 


A~l  = 


det(jd) 


-adj  04) 


EXAMPLE  7 Using  the  Adjoint  to  Find  an  Inverse  Matrix 


Use  8 to  find  the  inverse  of  the  matrix  A in  Example  6. 

We  leave  it  for  you  to  check  that  det(j4)  = 64.  Thus 


A -1 


1 

det(j4) 


adj  04) 


12 

4 

12 

6 

2 

-10 

-16 

16 

16 

12 

4 

12 

64 

64 

64 

6 

2 

10 

64 

64 

64 

16 

16 

16 

64 

64 

64 

Cramer's  Rule 


Our  next  theorem  uses  the  formula  for  the  inverse  of  an  invertible  matrix  to  produce  a formula,  called  Cramer's 


rule,  for  the  solution  of  a linear  system  /ix  = b °f  n equations  in  n unknowns  in  the  case  where  the  coefficient 
matrix  A is  invertible  (or,  equivalently,  when  det(j4)  * 0). 


OREM  2.3.7  Cramer's  Rule 


If;4x  = b is  a system  of  n linear  equations  in  n unknowns  such  that  det(^)  * 0,  then  the  system  has  a 
unique  solution.  This  solution  is 

det(^i)  det(A2)  _ det(iiM) 

1 det(^)  ’ 2 det (A)  ’ ’ ” det(A) 

where  A,  is  the  matrix  obtained  by  replacing  the  entries  in  the  yth  column  of  A by  the  entries  in  the  matrix 


If  det(^4)  0,  then  A is  invertible,  and  by  Theorem  1 .6.2,  x = A *b  is  the  unique  solution  of  Ax  = b- 

Therefore,  by  Theorem  2.3.6  we  have 


x = A = 


det(;4) 


■adj(^)b  = 


det(j4) 


'Cn 

C2l  - 

c»r 

V 

Cn 

^22  — 

* 2 

i 

Q .. 

3 

^2n  — 

Cyin 

by\ 

Multiplying  the  matrices  out  gives 


x = 


1 


det(j4) 

The  entry  in  the  /t  h row  of  x is  therefore 


^1^11  A-b2&2\  ■+■  --- ■+■  bnCn\ 
b\C\2  + i>2^22  + ---  + bnC„2 


b\C\ « + ^2^2 n + - - - "I"  bnC-: 


yin 


b 1 C\j  + btfhi  + • - - + b„Cnj 

det(j4) 


x)  = 


Now  let 


AJ- 


<*  11  £12  — £1/-1  b\  lj +1 

<321  <*22  — £2j-l  b2  <Z2j+l 

£«1  an2  ---  anj— 1 bn  ctnj+\ 


aln 

a2n 


a 


YlYl 


(11) 


Since  A ? differs  fromv4  only  in  the  yth  column,  it  follows  that  the  cofactors  of  entries  b\,  b2,  bn  in  Aj  are  the 
same  as  the  cofactors  of  the  corresponding  entries  in  the  yth  column  of  A.  The  cofactor  expansion  of  det(-dy) 
along  the  yth  column  is  therefore 


det(j4y)  — b\C\j  + ^2^2 j + •••  + bn^n] 

Substituting  this  result  in  1 1 gives 

det(^4?) 

Xj=  det  (A) 


EXAMPLE  8 Using  Cramer's  Rule  to  Solve  a Linear  System 

Use  Cramer's  rule  to  solve 


*1 

+ 

+ 

2x3  = 

6 

-3xi 

+ 

4^2 

+ 

6x3  = 

30 

-xi 

— 

2x2 

+ 

3x3  = 

8 

Gabriel  Cramer  (1704-1752) 


Variations  of  Cramer's  rule  were  fairly  well  known  before  the  Swiss 
mathematician  discussed  it  in  work  he  published  in  1750.  It  was  Cramer's  superior  notation 
that  popularized  the  method  and  led  mathematicians  to  attach  his  name  to  it. 
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1 0 2' 

'6  0 2' 

A = 

-3  4 6 

. A\  = 

30  4 6. 

-1  -2  3_ 

8 -2  3_ 

1 6 2' 

1 0 6 

a2  = 

-3  30  6 

. A3  = 

-3  4 30 

-18  3 

-1  -2  8 

Solution 


For  n > 3,  it  is  usually  more  efficient  to 
solve  a linear  system  with  n equations  in  n 
unknowns  by  Gauss-Jordan  elimination 
than  by  Cramer's  rule.  Its  main  use  is  for 
obtaining  properties  of  solutions  of  a 
linear  system  without  actually  solving  the 
system. 


Therefore, 

_ det(i4i)  _ -40  _ -10  _ det(^2)  _ 72  _ 18 

1 det(^)  44  11  ’ 2 det(i4)  44  11’ 

det(^3)  _ 152  _ 38 
3 det(^)  44  11 


Equivalence  Theorem 

In  Theorem  1.6.4  we  listed  five  results  that  are  equivalent  to  the  invertibility  of  a matrix  A.  We  conclude  this 
section  by  merging  Theorem  2.3.3  with  that  list  to  produce  the  following  theorem  that  relates  all  of  the  major 
topics  we  have  studied  thus  far. 


Equivalent  Statements 

If  A is  an  ^ x n matrix,  then  the  following  statements  are  equivalent. 

(a)  A is  invertible. 

(b)  Ax  = 0 has  only  the  trivial  solution. 

(c)  The  reduced  row  echelon  form  of  A is  ln. 

(d)  A can  be  expressed  as  a product  of  elementary  matrices. 

(e)  Ax  = b is  consistent  for  every  n x 1 matrix  b. 

(f)  Ax  = b has  exactly  one  solution  for  every  ^ x 1 matrix  b. 

(g)  det(j4)  * 0. 


OPTIONAL 

We  now  have  all  of  the  machinery  necessary  to  prove  the  following  two  results,  which  we  stated  without  proof 
Theorem  1.7.1: 

Theorem  1.7.1(c)  A triangular  matrix  is  invertible  if  and  only  if  its  diagonal  entries  are  all  nonzero. 


Theorem  1.7.1  (d)  The  inverse  of  an  invertible  lower  triangular  matrix  is  lower  triangular,  and  the  inverse  of  an 
invertible  upper  triangular  matrix  is  upper  triangular. 

Let  A = [a  j ? ] be  a triangular  matrix,  so  that  its  diagonal  entries  are 


a\\>  a22>  — » ann 

From  Theorem  2.1.2,  the  matrix  ,4  is  invertible  if  and  only  if 

det(A)  = <^11^22  ' * &ym 

is  nonzero,  which  is  true  if  and  only  if  the  diagonal  entries  are  all  nonzero. 

We  will  prove  the  result  for  upper  triangular  matrices  and  leave  the  lower 
triangular  case  for  you.  Assume  that  A is  upper  triangular  and  invertible.  Since 


A~l 


1 

det  (A) 


adj  (A) 


we  can  prove  that  A ~ * is  upper  triangular  by  showing  that  adj(^4)  is  upper  triangular  or,  equivalently,  that  the 
matrix  of  cofactors  is  lower  triangular.  We  can  do  this  by  showing  that  every  cofactor  Cy  with  i < j (i.e.,  above 
the  main  diagonal)  is  zero.  Since 


it  suffices  to  show  that  each  minor  My  with  i < j is  zero.  For  this  purpose,  let  5y  be  the  matrix  that  results  when 
the  /th  row  and  yth  column  of  A are  deleted,  so 


M y — det(5y) 


(12) 


From  the  assumption  that  j < j,  it  follows  that  5y  is  upper  triangular  (see  Figure  Figure  1.7.1).  Since  A is  upper 
triangular,  its  ( i + 1 ) -st  row  begins  with  at  least  / zeros.  But  the  /th  row  of  £y  is  the  (i  + 1 ) -st  row  of  A with  the 
entry  in  the y'th  column  removed.  Since  i < j,  none  of  the  first  / zeros  is  removed  by  deleting  the yth  column;  thus 
the  /th  row  of  starts  with  at  least  / zeros,  which  implies  that  this  row  has  a zero  on  the  main  diagonal.  It  now 
follows  from  Theorem  2.1.2  that  det(5y)  = 0 and  from  12  that  My  = 0. 


Concept  Review 

Determinant  test  for  invertibility 
Matrix  of  cofactors 
Adjoint  of  a matrix 
Cramer's  rule 

Equivalent  statements  about  an  invertible  matrix 

Skills 

Know  how  determinants  behave  with  respect  to  basic  arithmetic  operations,  as  given  in  Equation  1, 
Theorem  2.3.1,  Lemma  2.3.2,  and  Theorem  2.3.4. 

Use  the  determinant  to  test  a matrix  for  invertibility. 


Know  how  det(.d)  and  det(^4  *)  are  related. 

Compute  the  matrix  of  cofactors  for  a square  matrix  A. 

Compute  adj(^4)  for  a square  matrix  A. 

Use  the  adjoint  of  an  invertible  matrix  to  find  its  inverse. 

Use  Cramer's  rule  to  solve  linear  systems  of  equations. 

Know  the  equivalent  characterizations  of  an  invertible  matrix  given  in  Theorem  2.3.8. 


Exercise  Set  2.3 

In  Exercises  1-4,  verify  that  det(/Li4)  = &J1det(.d). 

1. 


' A = 


2. 


A = 


-1  2 
3 4 

2 2 


3. 


A = 


-2 

-1 

2 


4. 


A = 


5 

2 
3 

1 4 5 

1 1 1 

0 2 3 

0 1 -2 


; k = 2 
■ k=  -4 


; k=  -2 


; k = 3 


In  Exercises  5-6,  verify  that  det(A5)  = det (BA)  and  determine  whether  the  equality 
det(-d  + B)  = det(j4)  4-  det (5)  holds. 


5. 

"2  1 O' 

"1 

-1  3" 

A = 

3 4 0 

and  B = 

7 

1 2 

0 0 2 

5 

0 1 

6. 

'-1  8 

2' 

'2  -1 

-4' 

A = 

1 0 

-1 

and  B = 

1 1 

3 

-2  2 

2 

0 3 

-1 

In  Exercises  7-14,  use  determinants  to  decide  whether  the  given  matrix  is  invertible. 

7.  [ 2 5 5' 

A=  -1  -1  0 
2 4 3 


Answer: 


Invertible 


8.  [ 2 0 3' 

A=  0 3 2 

-2  0 -4 

9.  [2-3  5' 

A=  0 1-3 

0 0 2_ 

Answer: 

Invertible 

10.  [-3  o r 

A=  5 0 6 

8 0 3 

11.  [ 4 2 8' 

A=  -2  1 -4 

3 1 6 

Answer: 

Not  invertible 

12.  [1  0 -l' 

A=  9—1  4 

8 9-1 

13.  [20  O' 

A=  8 10 

-5  3 6 

Answer: 

Invertible 

{2  0 

3/2  -3/7  0 

5 -9  0 

In  Exercises  15-18,  find  the  values  of  k for  which  A is  invertible. 


Answer: 


17.  [12  4' 

A=  3 16 

* 3 2_ 

Answer: 

fc*  -1 

18.  [12  0" 

.4  = k 1 k 

0 2 1 

In  Exercises  19-23,  decide  whether  the  given  matrix  is  invertible,  and  if  so,  use  the  adjoint  method  to  find  its 
inverse. 

19.  [25  5" 

A=  -1  -1  0 

2 4 3_ 

Answer: 

3 -5  —5 
A~x=  -3  4 5 

2 -2  -3 

20.  [ 2 0 3" 

A=  0 3 2 

-2  0 -4 

21.  [2-3  5' 

A=  0 1-3 

0 0 2 

Answer: 


22.  [ 2 0 0" 

A=  8 10 
-5  3 6 

23.  [13  11] 


1 3 2 2j 


Answer: 


-A 

2 

-7 

6 


3 

-1 

0 

0 


0 

0 

-1 

1 


-1 

0 

8 

-7 


In  Exercises  24-29,  solve  by  Cramer's  rule,  where  it  applies. 


24.  7xi 

- 2x2  = 

3 

3xi 

+ X2  = 

5 

25.  4x 

+ 5y 

= 

= 2 

1 lx 

+ y + 

2z  = 

= 3 

X 

+ 5y  + 

2 z - 

= 1 

Answer: 

3 2 

1 

x = - 

TT’  y = JT’ 

' 

11 

26.  x 

- Ay  + 

z = 

6 

Ax 

— y + 

2z  = 

-1 

2x 

+ 2 y - 

3z  = 

-20 

27.  xi 

— 3x2  + 

*3 

= 4 

2x\ 

“ *2 

= -2 

4xi 

— 

3x3 

= 0 

Answer: 

30 

38 

40 

XI  = 

"TT  *2=- 

ir 

3 = “TT 

28.  -xi 

— 4x2  4* 

2x3 

+ x4  = 

-32 

2xi 

- X2  + 

7x3 

+ 9x4  = 

14 

-*1 

+ X2  + 

3x3 

+ x4  = 

11 

*1 

- 2x2  + 

^3 

— 4x4  = 

-4 

29.  3xi 

x2  + 

*3 

= 4 

-*1 

+ 7x2  - 

2x3 

= 1 

2xi 

+ 6x2  - 

*3 

= 5 

Answer: 


Cramer's  rule  does  not  apply. 
30.  Show  that  the  matrix 


A = 


cos  0 sin  0 0 
—sin  0 cos  0 0 
0 0 1 


is  invertible  for  all  values  of  0;  then  find  A * using  Theorem  2.3.6. 


31.  Use  Cramer's  rule  to  solve  for  y without  solving  for  the  unknowns  x,  z,  and  w. 


Answer: 


4x 

+ 

y 

+ 

Z 

+ 

w = 

6 

3x 

+ 

ly 

— 

Z 

+ 

w = 

1 

lx 

+ 

3y 

— 

5z 

+ 

8w  = 

-3 

X 

4= 

y 

+ 

z 

+ 

2w  = 

3 

y = o 

32.  Let  Jix  = b be  the  system  in  Exercise  3 1 . 

(a)  Solve  by  Cramer's  rule. 

(b)  Solve  by  Gauss-Jordan  elimination. 

(c)  Which  method  involves  fewer  computations? 

33.  Prove  that  if  det(^4)  = 1 and  all  the  entries  in  A are  integers,  then  all  the  entries  in  are  integers. 

34.  Let  Jix  = b be  a system  of  n linear  equations  in  n unknowns  with  integer  coefficients  and  integer  constants. 
Prove  that  if  det(^4)  = 1,  the  solution  x has  integer  entries. 

35.  Let 

a b c 
A=  d e f 
g h i 

Assuming  that  det(^4)  = — 7,  find 

(a)  det(3^4) 

(b)  det(^_1) 

(c)  det(2^_1) 

(d)  det((2^)_1) 

(e)  a g d 
det  b h e 

c i / 

Answer: 

(a)  -189 

(b)  -I 


(e)  7 

36.  In  each  part,  find  the  determinant  given  that  A is  a 4 x 4 matrix  for  which  det(-d)  = — 2 . 
(a)  det(  — A) 


(b)  det(j4  -1) 

(c)  det(2^r) 

(d)  det(J3) 

37.  In  each  part,  find  the  determinant  given  that  A is  a 3 x 3 matrix  for  which  det(^4)  = 7 . 

(a)  det(3^4) 

(b)  det(^_1) 

(c)  det(2^_1) 

(d)  det((2^)_1) 

Answer: 

(a)  189 

<b>  i 

(c)  I 

(d)  J_ 

56 

38.  Prove  that  a square  matrix  A is  invertible  if  and  only  if  A ^ A is  invertible. 

39.  Show  that  if  A is  a square  matrix,  then  det(^4 J A)  = det(  AA  T) . 

True-False  Exercises 

In  parts  (a)-(l)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  If  A is  a 3 x 3 matrix,  then  det(2-d)  = 2 det(^4) . 

Answer: 

False 

(b)  If  A and  B are  square  matrices  of  the  same  size  such  that  det(^4)  = det(5) , then  det(^4  + B)  = 2 det(^4) . 
Answer: 

False 

(c)  If  A and  B are  square  matrices  of  the  same  size  and  A is  invertible,  then 

detG4_1&4)  = det(5) 


Answer: 


True 


(d)  A square  matrix  A is  invertible  if  and  only  if  det(^4)  = 0. 

Answer: 

False 

(e)  The  matrix  of  cofactors  of  A is  precisely  [ adj(^4)  ] . 

Answer: 

True 

(f)  For  every  ^ x n matrix  A,  we  have 

A • adj(-d)  = (det(-d))/„ 


Answer: 

True 

(g)  If  A is  a square  matrix  and  the  linear  system  Ax  = 0 has  multiple  solutions  for  x,  then  det(^4)  = 0. 

Answer: 

True 

(h)  If  A is  an  ^ x n matrix  and  there  exists  an  n x 1 matrix  b such  that  the  linear  system  Ax  = b has  no  solutions, 
then  the  reduced  row  echelon  form  of  A cannot  be  In. 

Answer: 

True 

(i)  If  E is  an  elementary  matrix,  then  Ex  = 0 has  only  the  trivial  solution. 

Answer: 

True 

(j)  If  A is  an  invertible  matrix,  then  the  linear  system  Ax  = 0 has  only  the  trivial  solution  if  and  only  if  the  linear 
system  A~^x  = 0 has  only  the  trivial  solution. 

Answer: 

True 

(k)  If  A is  invertible,  then  adj(^4)  must  also  be  invertible. 

Answer: 

True 

(l)  If  A has  a row  of  zeros,  then  so  does  adj(-d) . 

Answer: 


False 
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Supplementary  Exercises 


In  Exercises  1-8,  evaluate  the  determinant  of  the  given  matrix  by  (a)  cofactor  expansion  and  (b)  using 
elementary  row  operations  to  introduce  zeros  into  the  matrix. 


1. 


-4  2 
3 3 


Answer: 


-18 


2. 


0 2-1 

-3  1 1 


Answer: 


24 


4. 


5. 


-1 

-4 

-7 


-2 

-5 

-8 


-3' 

-6 

-9 


3 0-1 
1 1 1 

0 4 2 


Answer: 


-10 


-5 

1 

4 

3 

0 

2 

1 

-2 

2 

3 

6 

0 

1 

-2 

3 

1 

4 

1 

0 ■ 

-1 

1 

-9 

2 ■ 

-2 

2 

Answer: 

329 

"— 1 -2  -3  -4 
4 3 2 1 

12  3 4 

—4  -3  -2  -1 


8. 


9.  Evaluate  the  determinants  in  Exercises  3-6  by  using  the  arrow  technique  (see  Example  7 in  Section  2.1). 
Answer: 

Exercise  3:  24;  Exercise  4:  0;  Exercise  5:  -10;  Exercise  6:  -48 

(a)  Construct  a 4 x 4 matrix  whose  determinant  is  easy  to  compute  using  cofactor  expansion  but  hard  to 
evaluate  using  elementary  row  operations. 

(b)  Construct  a 4 x 4 matrix  whose  determinant  is  easy  to  compute  using  elementary  row  operations  but 
hard  to  evaluate  using  cofactor  expansion. 

11.  Use  the  determinant  to  decide  whether  the  matrices  in  Exercises  1-4  are  invertible. 

Answer: 


The  matrices  in  Exercises  1-3  are  invertible,  the  matrix  in  Exercise  4 is  not. 

12.  Use  the  determinant  to  decide  whether  the  matrices  in  Exercises  5-8  are  invertible. 

In  Exercises  13-15,  find  the  determinant  of  the  given  matrix  by  any  method. 


13. 


5 

6-2 


b- 3 
-3 


Answer: 


14. 


15. 


_*2  + 56-21 
3-4  a 

a1  1 2 

2 a-1  4 

0 0 0 0 

0 0 0 -4 

0 0-1  0 

0 2 0 0 

5 0 0 0 


-3 

0 

0 

0 

0 


Answer: 

-120 

16.  Solve  forx. 


x -1 
3 1-x 


1 0 -3 

2 x —6 

1 3 x-5 


In  Exercises  17-24,  use  the  adjoint  method  (Theorem  2.3.6)  to  find  the  inverse  of  the  given  matrix,  if  it 
exists. 


17.  The  matrix  in  Exercise  1 . 


Answer: 


_I  1 

6 9 

1 2 

9 

18.  The  matrix  in  Exercise  2. 

19.  The  matrix  in  Exercise  3. 


Answer: 


1 

1 

3 

8 

8 

8 

1 

5 

1 

8 

24 

24 

1 

7 

1 

4 

12 

12 

20.  The  matrix  in  Exercise  4. 

21.  The  matrix  in  Exercise  5. 


Answer: 


1 2 _J_ 

5 5 10 

1 _3  2 

5 5 5 

_2  6 _3_ 

5 5 10 

22.  The  matrix  in  Exercise  6. 

23.  The  matrix  in  Exercise  7. 


Answer: 


10 

2 

52 

27 

329 

329 

329 

329 

55 

11 

43 

16 

329 

329 

329 

329 

3 

10 

25 

6 

47 

47 

47 

47 

31 

72 

102 

15 

"329 

329 

329 

329 

24.  The  matrix  in  Exercise  8. 

25.  Use  Cramer's  rule  to  solve  for  x'  and  y'  in  terms  ofx  and  v. 


Answer: 


x 

y 


r 


/ 


*'  = |*  + ^y,  y'  = - JX  + 


26.  Use  Cramer's  rule  to  solve  for  x'  and  y'  in  terms  ofx  and  v. 


x =xr  cos  6—  yl  sin# 
y =xf  smd+y'  cos  6 


27.  By  examining  the  determinant  of  the  coefficient  matrix,  show  that  the  following  system  has  a nontrivial 
solution  if  and  only  if  a = 3. 

x + y + az  = 0 

x + y + ,3z  = 0 

ax  + fty  + z = 0 


28.  Let  A be  a 3 x 3 matrix,  each  of  whose  entries  is  1 or  0.  What  is  the  largest  possible  value  for  det(-d)? 

29’  (a)  For  the  triangle  in  the  accompanying  figure,  use  trigonometry  to  show  that 

b cos  7 + c cos  ,3  — a 

c cos  a + a cos  j = b 

a cos  + b cos  a = c 

and  then  apply  Cramer's  rule  to  show  that 


cos  ft  = 


,2  , 2 2 

b +c  — a 

Ibc 


(b)  Use  Cramer's  rule  to  obtain  similar  formulas  for  cos.)'  and  COS7. 


Figure  Ex-29 


Answer: 


(b)  a c2  I a2  — b2 
v 7 cos  £j  = — — — — 


, cos  7 = 


2 , .2  2 

a ±b  — c 


lac  ’ ' lab 

30.  Use  determinants  to  show  that  for  all  real  values  of  X,  the  only  solution  of 

x — ly  — Xx 
x - y = A y 

is  x = 0,  y = 0- 


31.  Prove:  If  A is  invertible,  then  adj(^4)  is  invertible  and 


[adjU)]-‘  = 


1 


det(j4) 


-A  = ad)(A~l) 


32.  Prove:  If  A is  an  n x n matrix,  then 


det[adj(j4)]  = [det(j4)] 


n- 1 


33.  Prove:  If  the  entries  in  each  row  of  an  « x n matrix  A add  up  to  zero,  then  the  determinant  of  A is  zero. 
[Hint:  Consider  the  product  AX>  where  X is  the  « x 1 matrix,  each  of  whose  entries  is  one. 

(a)  In  the  accompanying  figure,  the  area  of  the  triangle  ABC  can  be  expressed  as 

area  ABC  = area  ADEC  A-  area  CEFB  — dxzaADFB 
Use  this  and  the  fact  that  the  area  of  a trapezoid  equals  the  altitude  times  the  sum  of  the  parallel 
sides  to  show  that 


area  ABC  = — 


[Note:  In  the  derivation  of  this  formula,  the  vertices  are  labeled  such  that  the  triangle  is  traced 
counterclockwise  proceeding  from  (x  j , y [ ) to  (x-f,  y -■ ) to  (^3,  v-.;i  • For  a clockwise  orientation,  the 
determinant  above  yields  the  negative  of  the  area.] 

(b)  Use  the  result  in  (a)  to  find  the  area  of  the  triangle  with  vertices  (3,  3),  (4,  0),  (-2,  -1). 


*1 

y 1 

1 

*3 

y 2 

1 

*3 

y 3 

1 

35.  Use  the  fact  that  21,375,  38,798,  34,162,  40,223,  and  79,154  are  all  divisible  by  19  to  show  that 

2 13  7 5 


7 

1 

2 

1 


is  divisible  by  19  without  directly  evaluating  the  determinant. 
36.  Without  directly  evaluating  the  determinant,  show  that 

sin  ct  cos  a sin(o:  + 5) 
sin  @ cos  @ sin(,$  + S ) 
sin  7 cos  7 sin(7  + <5) 


= 0 
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CHAPTER 


Euclidean  Vector  Spaces 


CHAPTER  CONTENTS 

Vectors  in  2-Space,  3-Space,  and  n-Space 

Norm,  Dot  Product,  and  Distance  in  R" 
Orthogonality 

The  Geometry  of  Linear  Systems 
Cross  Product 


INTRODUCTION 

Engineers  and  physicists  distinguish  between  two  types  of  physical  quantities — scalars, 
which  are  quantities  that  can  be  described  by  a numerical  value  alone,  and  vectors,  which 
are  quantities  that  require  both  a number  and  a direction  for  their  complete  physical 
description.  For  example,  temperature,  length,  and  speed  are  scalars  because  they  can  be 
fully  described  by  a number  that  tells  “how  much” — a temperature  of  20°C,  a length  of  5 
cm,  or  a speed  of  75  km/h.  In  contrast,  velocity  and  force  are  vectors  because  they  require 
a number  that  tells  “how  much”  and  a direction  that  tells  “which  way” — say,  a boat 
moving  at  10  knots  in  a direction  45°  northeast,  or  a force  of  100  lb  acting  vertically. 
Although  the  notions  of  vectors  and  scalars  that  we  will  study  in  this  text  have  their 
origins  in  physics  and  engineering,  we  will  be  more  concerned  with  using  them  to  build 
mathematical  structures  and  then  applying  those  structures  to  such  diverse  fields  as 
genetics,  computer  science,  economics,  telecommunications,  and  environmental  science. 
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3.1  Vectors  in  2-Space,  3-Space,  and  n-Space 

Linear  algebra  is  concerned  with  two  kinds  of  mathematical  objects,  “matrices”  and  “vectors.”  We  are  already 
familiar  with  the  basic  ideas  about  matrices,  so  in  this  section  we  will  introduce  some  of  the  basic  ideas  about 
vectors.  As  we  progress  through  this  text  we  will  see  that  vectors  and  matrices  are  closely  related  and  that 
much  of  linear  algebra  is  concerned  with  that  relationship. 


Geometric  Vectors 

Engineers  and  physicists  represent  vectors  in  two  dimensions  (also  called  2-space)  or  in  three  dimensions 
(also  called  3-space)  by  arrows.  The  direction  of  the  arrowhead  specifies  the  direction  of  the  vector  and  the 
length  of  the  arrow  specifies  the  magnitude.  Mathematicians  call  these  geometric  vectors.  The  tail  of  the 
arrow  is  called  the  initial  point  of  the  vector  and  the  tip  the  terminal  point  (Figure  3.1.1). 

Terminal  point 


Initial  point 

Figure  3.1.1 

In  this  text  we  will  denote  vectors  in  boldface  type  such  as  a,  b,  v,  w,  and  x,  and  we  will  denote  scalars  in 
lowercase  italic  type  such  as  a,  k,  v,  w,  and  x.  When  we  want  to  indicate  that  a vector  v has  initial  point  A and 
terminal  point  B,  then,  as  shown  in  Figure  3.1.2,  we  will  write 

v=  AB 


B 


v 


A 


Figure  3.1.2 

Vectors  with  the  same  length  and  direction,  such  as  those  in  Figure  3.1.3,  are  said  to  be  equivalent.  Since  we 
want  a vector  to  be  determined  solely  by  its  length  and  direction,  equivalent  vectors  are  regarded  to  be  the 
same  vector  even  though  they  may  be  in  different  positions.  Equivalent  vectors  are  also  said  to  be  equal, 
which  we  indicate  by  writing 


v = w 


Equivalent  vectors 


Figure  3.1.3 

The  vector  whose  initial  and  terminal  points  coincide  has  length  zero,  so  we  call  this  the  zero  vector  and 
denote  it  by  0.  The  zero  vector  has  no  natural  direction,  so  we  will  agree  that  it  can  be  assigned  any  direction 
that  is  convenient  for  the  problem  at  hand. 


Vector  Addition 

There  are  a number  of  important  algebraic  operations  on  vectors,  all  of  which  have  their  origin  in  laws  of 
physics. 


Parallelogram  Rule  for  Vector  Addition 


If  v and  w are  vectors  in  2-space  or  3 -space  that  are  positioned  so  their  initial  points  coincide,  then  the 
two  vectors  form  adjacent  sides  of  a parallelogram,  and  the  sum  v | w is  the  vector  represented  by 
the  arrow  from  the  common  initial  point  of  y and  w to  the  opposite  vertex  of  the  parallelogram 
(Figure  3.1.4a). 


V 


<«) 


w 


V + w 


{b) 

Figure  3.1.4 


W 

V + W 

W + V 


V 


(r) 


Here  is  another  way  to  form  the  sum  of  two  vectors. 


Triangle  Rule  for  Vector  Addition 

If  y and  w are  vectors  in  2-space  or  3 -space  that  are  positioned  so  the  initial  point  of  w is  at  the 
terminal  point  of  y,  then  the  sum  v | wis  represented  by  the  arrow  from  the  initial  point  of  y to  the 
terminal  point  of  w (Figure  3.1.46). 


In  Figure  3.1.4c  we  have  constructed  the  sums  v | w and  w | v by  the  triangle  rule.  This  construction  makes 
it  evident  that 


v+w  = w + v (1) 

and  that  the  sum  obtained  by  the  triangle  rule  is  the  same  as  the  sum  obtained  by  the  parallelogram  rule. 

Vector  addition  can  also  be  viewed  as  a process  of  translating  points. 


Vector  Addition  Viewed  as  Translation 

If  v,  w,  and  v | w are  positioned  so  their  initial  points  coincide,  then  the  terminal  point  of  v | w can 
be  viewed  in  two  ways: 

1.  The  terminal  point  of  v | w is  the  point  that  results  when  the  terminal  point  of  y is  translated  in 
the  direction  of  w by  a distance  equal  to  the  length  of  w (Figure  3.1.5a). 

2.  The  terminal  point  of  v | w is  the  point  that  results  when  the  terminal  point  of  w is  translated  in 
the  direction  of  y by  a distance  equal  to  the  length  of  y (Figure  3.1. 5b). 

Accordingly,  we  say  that  y ) w is  the  translation  of  y by  w or,  alternatively,  the  translation  of  w by  y. 


V 


V + w 


V + H 


/ 


» 

(«) 


( b ) 


Figure  3.1.5 


Vector  Subtraction 

In  ordinary  arithmetic  we  can  write  a—  b = a + ( — b),  which  expresses  subtraction  in  terms  of  addition. 
There  is  an  analogous  idea  in  vector  arithmetic. 

r ~i 


Vector  Subtraction 

The  negative  of  a vector  v,  denoted  by  _v,  is  the  vector  that  has  the  same  length  as  v but  is 
oppositely  directed  (Figure  3.1 .6a),  and  the  difference  of  v from  w-  denoted  by  w _ v.  is  taken  to  be 


the  sum 


w — v = w+  (— v) 


(2) 


J 


V 


-V 

(a) 


/ 

/ 

/ w 


/ 

/ 


/ 


/ 


-V 


(b) 


w 


V 


Figure  3.1.6 


(c) 


The  difference  of  y from  w can  be  obtained  geometrically  by  the  parallelogram  method  shown  in  Figure 

3.1 . 6b , or  more  directly  by  positioning  w and  y so  their  initial  points  coincide  and  drawing  the  vector  from  the 

terminal  point  of  y to  the  terminal  point  of  w (Figure  3.1.6c). 


Scalar  Multiplication 

Sometimes  there  is  a need  to  change  the  length  of  a vector  or  change  its  length  and  reverse  its  direction.  This 
is  accomplished  by  a type  of  multiplication  in  which  vectors  are  multiplied  by  scalars.  As  an  example,  the 
product  2v  denotes  the  vector  that  has  the  same  direction  as  y but  twice  the  length,  and  the  product  _2v 
denotes  the  vector  that  is  oppositely  directed  to  y and  has  twice  the  length.  Here  is  the  general  result. 


Scalar  Multiplication 

If  y is  a nonzero  vector  in  2-space  or  3-space,  and  if  k is  a nonzero  scalar,  then  we  define  the  scalar 
product  of  y by  k to  be  the  vector  whose  length  is  |£|  times  the  length  of  y and  whose  direction  is  the 
same  as  that  of  y if  k is  positive  and  opposite  to  that  of  y if  k is  negative.  If  k = 0 or  v = 0?  then  we 
define  £v  to  be  0. 


Figure  3.1.7  shows  the  geometric  relationship  between  a vector  y and  some  of  its  scalar  multiples.  In 
particular,  observe  that  ( — l)v  has  the  same  length  as  y but  is  oppositely  directed;  therefore, 


(_l)v=  -V 


(3) 


Figure  3.1.7 


Parallel  and  Collinear  Vectors 

Suppose  that  y and  w are  vectors  in  2-space  or  3-space  with  a common  initial  point.  If  one  of  the  vectors  is  a 
scalar  multiple  of  the  other,  then  the  vectors  lie  on  a common  line,  so  it  is  reasonable  to  say  that  they  are 
collinear  (Figure  3.1.8a).  However,  if  we  translate  one  of  the  vectors,  as  indicated  in  Figure  3.1.8&,  then  the 
vectors  are  parallel  but  no  longer  collinear.  This  creates  a linguistic  problem  because  translating  a vector  does 
not  change  it.  The  only  way  to  resolve  this  problem  is  to  agree  that  the  terms  parallel  and  collinear  mean  the 
same  thing  when  applied  to  vectors.  Although  the  vector  0 has  no  clearly  defined  direction,  we  will  regard  it 
to  be  parallel  to  all  vectors  when  convenient. 


/ 

k\  ' 


V 


V 


k\ 


/ 


( a ) 


(b) 


Figure  3.1.8 


Sums  of  Three  or  More  Vectors 

Vector  addition  satisfies  the  associative  law  for  addition,  meaning  that  when  we  add  three  vectors,  say  u,  y, 
and  w,  it  does  not  matter  which  two  we  add  first;  that  is, 

u+(v  + w)  = (u  + v)+w 

It  follows  from  this  that  there  is  no  ambiguity  in  the  expression  u | v | w because  the  same  result  is  obtained 
no  matter  how  the  vectors  are  grouped. 

A simple  way  to  construct  u | v | w is  to  place  the  vectors  “tip  to  tail”  in  succession  and  then  draw  the 
vector  from  the  initial  point  of  u to  the  terminal  point  of  w (Figure  3.1 .9a).  The  tip-to-tail  method  also  works 
for  four  or  more  vectors  (Figure  3.1 .9b).  The  tip-to-tail  method  also  makes  it  evident  that  if  u,  y,  and  w are 
vectors  in  3-space  with  a common  initial  point,  then  u | v | w is  the  diagonal  of  the  parallelepiped  that  has 
the  three  vectors  as  adjacent  sides  (Figure  3.1.9c). 
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Vectors  in  Coordinate  Systems 

Up  until  now  we  have  discussed  vectors  without  reference  to  a coordinate  system.  However,  as  we  will  soon 
see,  computations  with  vectors  are  much  simpler  to  perform  if  a coordinate  system  is  present  to  work  with. 

The  component  forms  of  the  zero  vector  are 
0 = (0,  0)  in  2-space  and  0 = (0,  0,  0)  in 
3-space. 


If  a vector  y in  2-space  or  3-space  is  positioned  with  its  initial  point  at  the  origin  of  a rectangular  coordinate 
system,  then  the  vector  is  completely  determined  by  the  coordinates  of  its  terminal  point  (Figure  3.1.10).  We 
call  these  coordinates  the  components  of  y relative  to  the  coordinate  system.  We  will  write  v = (vi,  V2)  to 
denote  a vector  y in  2-space  with  components  (vj,  V2),  and  v = (vi,  V2,  V3)  to  denote  a vector  y in  3-space 
with  components  (vj,  V2,  V3). 


y 


Figure  3.1.10 

It  should  be  evident  geometrically  that  two  vectors  in  2-space  or  3-space  are  equivalent  if  and  only  if  they 
have  the  same  terminal  point  when  their  initial  points  are  at  the  origin.  Algebraically,  this  means  that  two 
vectors  are  equivalent  if  and  only  if  their  corresponding  components  are  equal.  Thus,  for  example,  the  vectors 

▼ =(vi,  V2,  V3)  and  w=  Oi  m>2,  W3) 
in  3 -space  are  equivalent  if  and  only  if 

Vi  = M?i,  V2  = W2,  V3  = W3 

It  may  have  occurred  to  you  that  an  ordered  pair  (vj,  V2)  can  represent  either  a vector  with 


components  vi  and  V2  or  a point  with  components  vi  and  vj  (and  similarly  for  ordered  triples).  Both  are  valid 
geometric  interpretations,  so  the  appropriate  choice  will  depend  on  the  geometric  viewpoint  that  we  want  to 
emphasize  (F igure  3.1.11). 
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The  ordered  pair  (v  i , V2)  can  represent  a point  or  a vector. 


Vectors  Whose  Initial  Point  Is  Not  at  the  Origin 

It  is  sometimes  necessary  to  consider  vectors  whose  initial  points  are  not  at  the  origin.  If  P{P2  denotes  the 
vector  with  initial  point  (x  i , y\ ) and  terminal  point  p2  (x2,  yi)  > dien  the  components  of  this  vector  are 
given  by  the  formula 

^2  = 0C2-*1.*2-*1)  (4) 

That  is,  the  components  of  P\P2  are  obtained  by  subtracting  the  coordinates  of  the  initial  point  from  the 
coordinates  of  the  terminal  point.  For  example,  in  Figure  3.1.12  the  vector  P\P2  is  the  difference  of  vectors 
OP*2  and  OP 1 , SO 

p[p2  = op*2  - m = (x2>  yi)  - Oi>  yi ) = (.x2~x\>  yi-y\) 

As  you  might  expect,  the  components  of  a vector  in  3-space  that  has  initial  point  P\(x\,y\,z\)  and  terminal 
point  P2(x2,y2,zi)  are  §iven  by 

P\P2  = y2-y\,  Z2 -z\)  (5) 


Figure  3.1.12 


EXAMPLE  1 Finding  the  Components  of  a Vector 

The  components  of  the  vector  v = p ^ p -:  with  initial  point  P\ (2,  —1,4)  and  terminal  point 
P2(l,5,  -8)  are 

v=(7-2,  5 — (—1),  (—8)  — 4)  = (5,  6,  -12) 


n-Space 

The  idea  of  using  ordered  pairs  and  triples  of  real  numbers  to  represent  points  in  two-dimensional  space  and 
three-dimensional  space  was  well  known  in  the  eighteenth  and  nineteenth  centuries.  By  the  dawn  of  the 
twentieth  century,  mathematicians  and  physicists  were  exploring  the  use  of  “higher-dimensional”  spaces  in 
mathematics  and  physics.  Today,  even  the  layman  is  familiar  with  the  notion  of  time  as  a fourth  dimension,  an 
idea  used  by  Albert  Einstein  in  developing  the  general  theory  of  relativity.  Today,  physicists  working  in  the 
field  of  “string  theory”  commonly  use  1 1 -dimensional  space  in  their  quest  for  a unified  theory  that  will 
explain  how  the  fundamental  forces  of  nature  work.  Much  of  the  remaining  work  in  this  section  is  concerned 
with  extending  the  notion  of  space  to  ^-dimensions. 

To  explore  these  ideas  further,  we  start  with  some  terminology  and  notation.  The  set  of  all  real  numbers  can 
be  viewed  geometrically  as  a line.  It  is  called  the  real  line  and  is  denoted  by  /?  or/?1.  The  superscript 
reinforces  the  intuitive  idea  that  a line  is  one-dimensional.  The  set  of  all  ordered  pairs  of  real  numbers  (called 
2-tuples ) and  the  set  of  all  ordered  triples  of  real  numbers  (called  3 -tuples)  are  denoted  by  p/  and  p\ 
respectively.  The  superscript  reinforces  the  idea  that  the  ordered  pairs  correspond  to  points  in  the  plane 
(two-dimensional)  and  ordered  triples  to  points  in  space  (three-dimensional).  The  following  definition  extends 
this  idea. 


DEFINITION  1 

If  n is  a positive  integer,  then  an  ordered n-tuple  is  a sequence  of  n real  numbers  (vj,  v2, ....  v^). 
The  set  of  all  ordered  n-tuples  is  called  n-space  and  is  denoted  by  Rn. 


You  can  think  of  the  numbers  in  an  n-tuple  (vi,  v2, vM)  as  either  the  coordinates  of  a 
generalized  point  or  the  components  of  a generalized  vector,  depending  on  the  geometric  image  you  want  to 
bring  to  mind — the  choice  makes  no  difference  mathematically,  since  it  is  the  algebraic  properties  of  n-tuples 
that  are  of  concern. 


Here  are  some  typical  applications  that  lead  to  //-tuples. 


Experimental  Data  A scientist  performs  an  experiment  and  makes  n numerical  measurements  each  time 
the  experiment  is  performed.  The  result  of  each  experiment  can  be  regarded  as  a vector 
y=(y\,y2,-,yn)  in  R”  in  which  yuy2,->y»are  the  measured  values. 

Storage  and  Warehousing  A national  trucking  company  has  15  depots  for  storing  and  servicing  its  trucks. 
At  each  point  in  time  the  distribution  of  trucks  in  the  service  depots  can  be  described  by  a 15-tuple 
x=  (*i,  X2, ....  *15)  in  which  * 1 is  the  number  of  trucks  in  the  first  depot,  *2  is  the  number  in  the  second 
depot,  and  so  forth. 

Electrical  Circuits  A certain  kind  of  processing  chip  is  designed  to  receive  four  input  voltages  and 
produces  three  output  voltages  in  response.  The  input  voltages  can  be  regarded  as  vectors  in  and  the 
output  voltages  as  vectors  in  ft-*.  Thus,  the  chip  can  be  viewed  as  a device  that  transforms  an  input  vector 
v = (vi,  V2,  V3,  V4)  in  int0  an  output  vector  w=  (vi>i,  W2,  W3)  in  p?. 

Graphical  Images  One  way  in  which  color  images  are  created  on  computer  screens  is  by  assigning  each 
pixel  (an  addressable  point  on  the  screen)  three  numbers  that  describe  the  hue,  saturation,  and  brightness 
of  the  pixel.  Thus,  a complete  color  image  can  be  viewed  as  a set  of  5-tuples  of  the  form  v = (x,  y,  h,  s,  b) 
in  which  x andy  are  the  screen  coordinates  of  a pixel  and  h,  s,  and  b are  its  hue,  saturation,  and  brightness. 

Economics  One  approach  to  economic  analysis  is  to  divide  an  economy  into  sectors  (manufacturing, 
services,  utilities,  and  so  forth)  and  measure  the  output  of  each  sector  by  a dollar  value.  Thus,  in  an 
economy  with  10  sectors  the  economic  output  of  the  entire  economy  can  be  represented  by  a 10-tuple 
s = ($i,  S2>  -•->  slo)  m which  the  numbers  sj,  $2,  ...,  $io  are  the  outputs  of  the  individual  sectors. 

Mechanical  Systems  Suppose  that  six  particles  move  along  the  same  coordinate  line  so  that  at  time  t their 
coordinates  are  xj,  *2, ....  *6  ar|d  their  velocities  are  vj,  V2, ....  vg,  respectively.  This  information  can  be 
represented  by  the  vector 

v = (*i,  *2,  *3>  *4>  x6,  vi,  v2,  V3,  V4,  V5,  v6,  t) 

in  This  vector  is  called  the  state  of  the  particle  system  at  time  t. 


The  German-bom  physicist  Albert  Einstein  immigrated  to  the  United  States  in 
1935,  where  he  settled  at  Princeton  University.  Einstein  spent  the  last  three  decades  of  his  life 
working  unsuccessfully  at  producing  a unified  field  theory  that  would  establish  an  underlying  link 
between  the  forces  of  gravity  and  electromagnetism.  Recently,  physicists  have  made  progress  on  the 
problem  using  a framework  known  as  string  theory.  In  this  theory  the  smallest,  indivisible 
components  of  the  Universe  are  not  particles  but  loops  that  behave  like  vibrating  strings.  Whereas 


Einstein's  space-time  universe  was  four-dimensional,  strings  reside  in  an  1 1 -dimensional  world  that  is 
the  focus  of  current  research. 

[Image:  © Bettmann/©  Cor  bis] 


Operations  on  Vectors  in  Rn 

Our  next  goal  is  to  define  useful  operations  on  vectors  in  Rn.  These  operations  will  all  be  natural  extensions 
of  the  familiar  operations  on  vectors  in  r}  and  R-'.  We  will  denote  a vector  y in  Rn  using  the  notation 

V=  (vi,v2 v„) 

and  we  will  call  0 = (0,  0, ...,  0)  the  zero  vector. 

We  noted  earlier  that  in  r}  and  R-'  two  vectors  are  equivalent  (equal)  if  and  only  if  their  corresponding 
components  are  the  same.  Thus,  we  make  the  following  definition. 


DEFINITION  2 

Vectors  v = (v i , v2, . . vM)  and  w = (viq,  w2, ...,  in  Rn  are  said  to  be  equivalent  (also  called 
equal)  if 

Vl  = W\,  v2  = w2, ....  v„  = w„ 

We  indicate  this  by  writing  v = w. 


EXAMPLE  2 Equality  of  Vectors 

(a,b,c,d)  = ( 1,  -4,2,7) 
if  and  only  if  a = 1 , b = — 4,  c = 2,  and  d = l- 


Our  next  objective  is  to  define  the  operations  of  addition,  subtraction,  and  scalar  multiplication  for  vectors  in 
Rn.  To  motivate  these  ideas,  we  will  consider  how  these  operations  can  be  performed  on  vectors  in  f:/  using 
components.  By  studying  Figure  3.1.13  you  should  be  able  to  deduce  that  if  v = (vj,  v2)  and  w=  (>tq,  w2). 
then 


v + w=(vi+wj,  v2  + w2) 


(6) 


kv=  (jfcvj,  tv 2) 


(7) 


In  particular,  it  follows  from  7 that 


and  hence  that 


-v=  (-l)v=  (-VJ,  — v2) 


W — V = W + ( — v)  = (w  1 — v 1 , W2  — v2) 


Motivated  by  Formulas  6-9,  we  make  the  following  definition. 


(8) 

(9) 


1 


DEFINITION  3 

If  v = (vj,  v2, v„)  and  w=  (wj,  w2, wM)  are  vectors  in  Rn,  and  if  k is  any  scalar,  then  we 
define 


V + W=  Ol  +W1,  v2  + w2,  ...V„  + WM) 

(10) 

tv=  (/tvi,  kv2, 

(11) 

1 
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1 

to 

1 

< 

(12) 

= w+  (-v)  = (wi  -Vlf  W2-V2,...W„-V„) 

(13) 

In  words,  vectors  are  added  (or  subtracted)  by 
adding  (or  subtracting)  their  corresponding 
components,  and  a vector  is  multiplied  by  a 
scalar  by  multiplying  each  component  by  that 
scalar. 

EXAMPLE  3 Algebraic  Operations  Using  Components 

Ifv=(l,  — 3,  2)  and  w=  (4,  2,  1),  then 

v + w=  (5,  - 1,  3),  2v=(2,  -6,4) 

— w=(-4,  - 2-1)  v— w=v+(— w)  = (-3,  -5,  1) 


The  following  theorem  summarizes  the  most  important  properties  of  vector  operations. 


THEOREM  3.1.1 

Ifu,  v,  and  w are  vectors  in  Rn,  and  if  k and  m are  scalars,  then: 

(a)  u + v = v + u 

(b)  (u  + v)  +w  = u+  (v  + w) 

(cj  u+0=0+u=u 

(d)  u + ( - u)  = 0 
fej  fc(u  + v)  = An  + krv 
(f)  (k  4-  w)u  = £u  + mu 
fgj  k(mn)  = (km)u 

(h)  l«  = u 


We  will  prove  part  ( b ) and  leave  some  of  the  other  proofs  as  exercises. 

(b)  Let  u = (u\,  U2,  ....  un),  v = (vi,  v2, ....  vM),  and  w=  (m>i,  w2, w„).  Then 

(u  + v)  +W  = {(u\,u2,...,un)  + (vi,  v2,...,v„))  + (wi,  M>2 W„) 

= («1  +vi,«2+V2, «m  + vm)  + (w\,  W2, ....  Wm)  [Vector  addition] 

= ((«i  + vi)  4-  wi,  («2  +V2)  +W2, ....  (u„  + v„)  +wM)  [Vector  addition] 
= («i  + (vi  +wi),«2  + (V2 + W2), + (v„  + w„))  [Regroup] 

= («1,«2 un)  + (V1  +W1,  V2  + M'2 v„  + w„)  [Vector  addition] 

= u + (v  T w) 


The  following  additional  properties  of  vectors  in  Rn  can  be  deduced  easily  by  expressing  the  vectors  in  terms 
of  components  (verify). 


THEOREM  3.1.2 

If  v is  a vector  in  R n and  k is  a scalar,  then: 

(a)  0v  = 0 

(b)  *0  = 0 

(c)  (-  l)v=  -v 


Calculating  Without  Components 

One  of  the  powerful  consequences  of  Theorems  3.1.1  and  3.1.2  is  that  they  allow  calculations  to  be  performed 
without  expressing  the  vectors  in  terms  of  components.  For  example,  suppose  that  x,  a,  and  b are  vectors  in 
Rn,  and  we  want  to  solve  the  vector  equation  x | a = b for  the  vector  x without  using  components.  We  could 
proceed  as  follows: 

x + a = b [Given] 

(x  + a)  + ( — a)  = b + ( — a)  Add  the  negative  of  a to  both  sides 

x + (a  + ( — a))  = b — a Part  ( b ) of  Theorem  3.1.1 

x + 0 = b — a Part  ( d)  of  Theorem  3.1.1 

x = b — a Part  (c)  of  Theorem  3.1.1 

While  this  method  is  obviously  more  cumbersome  than  computing  with  components  in  Rn,  it  will  become 
important  later  in  the  text  where  we  will  encounter  more  general  kinds  of  vectors. 


Linear  Combinations 

Addition,  subtraction,  and  scalar  multiplication  are  frequently  used  in  combination  to  form  new  vectors.  For 
example,  if  vi,  V2,  and  V3  are  vectors  in  Rn,  then  the  vectors 

u = 2vj  4*  3v2  + V3  and  w = 7vj  — 6V2  + 8V3 
are  formed  in  this  way.  In  general,  we  make  the  following  definition. 


DEFINITION  4 

If  iv  is  a vector  in  Rn,  then  w is  said  to  be  a linear  combination  of  the  vectors  vj,  V2, . . vr  in  Rn  if  it 


can  be  expressed  in  the  form 


w = *ivi  +£2V2+  — + *rvr  (14) 

where  kj,  ...,kr  are  scalars.  These  scalars  are  called  the  coefficients  of  the  linear  combination.  In 
the  case  where  r = 1 , Formula  14  becomes  w = so  that  a linear  combination  of  a single  vector 
is  just  a scalar  muliple  of  that  vector. 


Note  that  this  definition  of  a linear  combination 
is  consistent  with  that  given  in  the  context  of 
matrices  (see  Definition  6 in  Section  1 .3). 


Application  of  Linear  Combinations  to  Color  Models 

Colors  on  computer  monitors  are  commonly  based  on  what  is  called  the  RGB  color  model.  Colors  in 
this  system  are  created  by  adding  together  percentages  of  the  primary  colors  red  (R),  green  (G),  and 
blue  (B).  One  way  to  do  this  is  to  identify  the  primary  colors  with  the  vectors 

r=  (1,  0,  0)  (pure  red), 
g=  (0,1,0)  (pure  green), 
b = (0,  0,  1)  (pure  blue) 

in  f-'  and  to  create  all  other  colors  by  forming  linear  combinations  of  r,  g,  and  b using  coefficients 
between  0 and  1,  inclusive;  these  coefficients  represent  the  percentage  of  each  pure  color  in  the  mix. 
The  set  of  all  such  color  vectors  is  called  RGB  space  or  the  RGB  color  cube  (Figure  3.1.14).  Thus, 
each  color  vector  c in  this  cube  is  expressible  as  a linear  combination  of  the  form 

c = £ir  + &2g  + &3b 

= *1(1,0,  0) +*2(0,  1,0) +*3(0,  0,1) 

= (*1»  *2>  *3) 

where  0 < < 1 . As  indicated  in  the  figure,  the  corners  of  the  cube  represent  the  pure  primary  colors 

together  with  the  colors  black,  white,  magenta,  cyan,  and  yellow.  The  vectors  along  the  diagonal 
running  from  black  to  white  correspond  to  shades  of  gray. 
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Figure  3.1.14 


Alternative  Notations  for  Vectors 


Up  to  now  we  have  been  writing  vectors  in  Rn  using  the  notation 


v=(vi,v2,...,  v„) 


(15) 


We  call  this  the  comma-delimited  form.  However,  since  a vector  in  pj1  is  just  a list  of  its  n components  in  a 
specific  order,  any  notation  that  displays  those  components  in  the  correct  order  is  a valid  way  of  representing 
the  vector.  For  example,  the  vector  in  15  can  be  written  as 


v=  [vj  v2...v„] 


(16) 


which  is  called  row-matrix  form,  or  as 


(17) 


which  is  called  column-matrix  form.  The  choice  of  notation  is  often  a matter  of  taste  or  convenience,  but 
sometimes  the  nature  of  a problem  will  suggest  a preferred  notation.  Notations  15,  16,  and  17  will  all  be  used 
at  various  places  in  this  text. 


Concept  Review 

Geometric  vector 
Direction 
Length 
Initial  point 
Terminal  point 
Equivalent  vectors 


Zero  vector 

Vector  addition:  parallelogram  rule  and  triangle  rule 

Vector  subtraction 

Negative  of  a vector 

Scalar  multiplication 

Collinear  (i.e.,  parallel)  vectors 

Components  of  a vector 

Coordinates  of  a point 

ft-tuple 

n- space 

Vector  operations  in  ^-space:  addition,  subtraction,  scalar  multiplication 
Linear  combination  of  vectors 

Skills 

Perform  geometric  operations  on  vectors:  addition,  subtraction,  and  scalar  multiplication. 
Perform  algebraic  operations  on  vectors:  addition,  subtraction,  and  scalar  multiplication. 
Determine  whether  two  vectors  are  equivalent. 

Determine  whether  two  vectors  are  collinear. 

Sketch  vectors  whose  initial  and  terminal  points  are  given. 

Find  components  of  a vector  whose  initial  and  terminal  points  are  given. 

Prove  basic  algebraic  properties  of  vectors  (Theorems  3.1.1  and  3.1.2). 


Exercise  Set  3.1 

In  Exercises  1-2,  draw  a coordinate  system  (as  in  Figure  3.1.10)  and  locate  the  points  whose  coordinates  are 


given. 

L (a) 

(3,  4,  5) 

(b) 

(-3,  4,  5) 

(c) 

(3,  -4,  5) 

(d) 

(3,  4,  -5) 

(e) 

(-3,  -4,  5) 

(f) 

(-3,  4,  -5) 

Answer: 


(a) 


(b) 


(c) 


(d) 


(e) 


(f) 


d — / r 
■ 

' I " _L 


lc.:*' 


(3,4,  -5) 


3,  -4. 5)i 

f-' 

3 

-i 

-i  * 

. h.T 

& 

Vh 


qrr. 


2- (a)  (0,3-3) 

(b)  (3-3,0) 

(c)  (-3,0,0) 

(d)  (3,  0,  3) 

(e)  (0,0, -3) 

(f)  (0,3,0) 

In  Exercises  3-4,  sketch  the  following  vectors  with  the  initial  points  located  at  the  origin. 

3-  (a)  vi  = (3,  6) 

(b)  v2  = (-4,  -8) 

(c)  V3  = ( - 4,  -3) 


(d)  v4=  (3,4,  5) 

(e)  v5  = (3,  3,  0) 

(f)  v6  = (-l,0,  2) 

Answer: 

(a) 


(b) 


I 


L I 

. j-i-1 1 1 


4-  (a)  vi  = (5,  - 4) 

(b)  v2  = (3,  0) 

(c)  V3  = (0,  - 7) 

(d)  v4=  (0,  0,  -3) 

(e)  v5=  (0,4,  - 1) 


(f)  v6  = (2,  2,  2) 

In  Exercises  5-6,  sketch  the  following  vectors  with  the  initial  points  located  at  the  origin. 


5-(a)  Pi  (4,  8),  P2(3,7) 

(b)Pi(3,  -5),  P2(- 4,-7) 

(C)  Pi (3,  -7.2).  P2(  — 2.  5,  —4) 

Answer: 


(a) 

(b) 

(c) 


I 1 1 1 1 

t 

1 1 

iy  X 

1 

j 

V: 


iCl 


1 1 1 1 1 1 


V 

L- 


6.(a)  Pi  (-5,0),  P2(  — 3,  1) 

(b)^l(O.O),  P2(3,4) 

(C)  Pi(-1,0,2),  P2(0,  -1,0) 

(d)  PK2.2.2),  P2(0,  0,  0) 

In  Exercises  7-8,  find  the  components  of  the  vector 


7-(a)  Pi(3,5),  P2(2,  8) 

(b)  Pi  (5,  -2,1),  P2(2, 4,  2) 


Answer: 


(a)  P\P2  = (—1,  3) 

(b)  P^P2  = (-3,6,1) 

8.(a)  Pi(-6,2),  P2(  — 4,  — 1) 

(b)  Pi  (0,0,0),  P2(- 1,6,1) 

(a)  Find  the  terminal  point  of  the  vector  that  is  equivalent  to  u = ( 1 , 2)  and  whose  initial  point  is  A(  1 , 1 ) 


(b)  Find  the  initial  point  of  the  vector  that  is  equivalent  to  u = (1,  1,  3)  and  whose  terminal  point  is 

B(-  1,  -1,2). 

Answer: 

(a)  The  terminal  point  is  5(2,  3). 

(b)  The  initial  point  is  A(—  2,  —2,  —1). 

(a)  Find  the  initial  point  of  the  vector  that  is  equivalent  to  u = (1,  2)  and  whose  terminal  point  is  5(2,  0) 

(b)  Find  the  terminal  point  of  the  vector  that  is  equivalent  to  u = (1,  1,  3)  and  whose  initial  point  is 
AO,  2,0). 

11.  Find  a nonzero  vector  u with  terminal  point  Q( 3,  0,  — 5)  such  that 

(a)  u has  the  same  direction  asv=(4,  — 2,  — 1). 

(b)  u is  oppositely  directed  to  v = (4,  — 2,  — 1). 

Answer: 

(a)  u = ( — 1,  2,  — 4)  is  one  possible  answer. 

(b)  u = (7,  — 2,  — 6)  is  one  possible  answer. 

12.  Find  a nonzero  vector  u with  initial  point  P(  — 1,3,  — 5)  such  that 

(a)  u has  the  same  direction  asv=(6,7,  — 3). 

(b)  u is  oppositely  directed  tov=(6,7,  — 3). 

13.  Let  u = (4,  — 1),  v = (0,  5),  and  w=  ( — 3,  — 3).  Find  the  components  of 

(a)  u + w 

(b)  v - 3u 
(C)  2(u—  5w) 

(cl)  3v  — 2(u  + 2w) 

(e)  — 3(w— 2u  + v) 

(f)  ( — 2u  — v)  — 5(v  + 3w) 

Answer: 

(a)  u+w=  (1,  -4) 

(b)  v-3u=  (-12,  8) 

(c)  2(u  — 5w)  = (38,  28) 

(d)  3v  - 2(u  + 2w)  = (4,  29) 

(e)  — 3(w—  2u  + v)  = (33,  -12) 

(f)  (— 2u  — v)  — 5(v  + 3w)  = (37,  17) 

14.  Let  u = ( — 3,  1 , 2) , v = (4,  0,  — 8) , and  w = (6,  — 1 , — 4) . Find  the  components  of 


(a)  v-w 

(b)  6u  + 2v 

(c)  -v  + u 

(d)  5(v  — 4u) 

(e)  — 3(v  — 8w) 

(f)  (2u  - 7w)  - (8v  + u) 

15.  Let  u = ( — 3,  2,  1,  0),  v=  (4,  7,  — 3,  2),  and  w = (5,  — 2,  8,  1).  Find  the  components  of 

(a)  v-w 

(b)  2u  + 7v 

(c)  -u  + (v  - 4w) 

(d)  6(u-3v) 

(e)  -v-w 

(f)  (6v  — w)  — (4u  + v) 

Answer: 

(a)  (-1.9,  -11,1) 

(b)  (22,53,  - 19,14) 

(c)  (-13,13,  -36,  -2) 

(d)  (-90,  - 114,60,  -36) 

(e)  (-9,  -5,  -5,  -3) 

(f)  (27,29,  -27,9) 

16.  Let  u,  v,  and  w be  the  vectors  in  Exercise  15.  Find  the  vector  x that  satisfies  5x  — 2v  = 2 (w  — 5x) . 

17.  Let  u = (5,  — 1,  0,  3,  — 3),  v=  ( — 1,  — 1,  7,  2,  0),  and  w=  ( — 4,  2,  — 3,  — 5,  2).  Find  the 
components  of 

(a)  w-u 

(b)  2v  + 3u 

(c)  — w-F  3(v  — u) 

(d)  5(  — v + 4u  — w) 

(e)  -2 (3w  + v)  + (2u  + w) 

(1)  (w  — 5v  4-  2u)  4-  v 

Answer: 

(a)  w-u=  (-9,  3,  -3,  -8,5) 

(b)  2v  + 3u  = (13,  -5,  14,13,  -9) 

(c)  — w+  3(v  — u)  = (—14,  -2,24,2,7) 

(d)  5(— v + 4u  — w)  = (125,  -25,  -20,75,  -70) 


(e)  -2(3w+v)  + (2u+w)  = (32,  - 10, 1,  27,  - 16) 
ffl  1(w-5v42«)+v=(|,  §,  -12,  -§,  -2) 

18.  Letu=  (1,  2,  —3,5,0),  v=(0,4,  — 1,  1,2),  andw=(7,  1,  —4,  — 2,  3).  Find  the  components  of 

(a)  v + w 

(b)  3(2u-v) 

(c)  (3u  - v)  - (2u  + 4w) 

19.  Let  u=  ( — 3,  1,  2,  4,  4),  v=(4,  0,  —8,1,2),  andw=(6,  —1,  —4,3,  — 5).  Find  the  components 
of 

(a)  v-w 

(b)  6u  + 2v 

(c)  (2u  - 7w)  - (8v  + u) 

Answer: 

(a)  v-w=  (-2,  1,  -4,  -2,7) 

(b)  6u  + 2v=  ( — 10,  6,  -4,26,28) 

(e)  (2u  - 7w)  - (8v  + u)  = (-77,  8,  94,  - 25,  23) 

20.  Let  u,  v,  and  w be  the  vectors  in  Exercise  18.  Find  the  components  of  the  vector  x that  satisfies  the 
equation  3u  4 v — 2w  = 3x  4 2w- 

21.  Let  u,  v,  and  w be  the  vectors  in  Exercise  19.  Find  the  components  of  the  vector  x that  satisfies  the 
equation  2u  - v 4 x = 7x  4 w- 

Answer: 

v=/_  8 18  2m 

\ 3’  2’  3’  3’  6 J 

22.  For  what  value(s)  of  t,  if  any,  is  the  given  vector  parallel  to  u = (4,  — 1)? 

(a)  (&,  -2) 

(b)  (8^,20 

(c)  (I,*2) 

23.  Which  of  the  following  vectors  in  are  parallel  to  u = ( — 2,  1,0, 3, 5,  1)? 

(a)  (4,2,0,6,10,2) 

(b)  (4,  -2,0,  -6,  -10,  -2) 

(c)  (0,  0,  0,  0,  0,  0) 


Answer: 


(a)  Not  parallel 

(b)  Parallel 


(c)  Parallel 


24.  Let  u = (2,  1,  0,  1,  — 1)  and  v = ( — 2,  3,  1,  0,  2)  . Find  scalars  a and  b so  that 
cm  -F  bv  = ( — 8,  8,  3,  —1,7). 

25.  Letu=(l,  — 1,  3,  5)  and  v = (2,  1,  0,  — 3).  Find  scalars  a and  6 so  that  cm  4- &v=  (1,  —4,9,18). 
Answer: 

a = 3,  b = — 1 

26.  Find  all  scalars  c i,  c2,  and  c 3 such  that 

ei(l.  2,  0)  + c2(2,  1. 1)  +c3(0,  3,  1)  = (0,  0,  0) 

27.  Find  all  scalars  Ci,  c2,  and  c3  such  that 

ci(h  - 1.  0)  + c2(3,  2,  1)  + c3(0,  1, 4)  = ( - 1,  1,  19) 


Answer: 

ci  = 2,  c2=  - 1,  C3  = 5 

28.  Find  all  scalars  c 1 , c2,  and  c3  such  that 

ci(  - 1,  0,  2)  +c2(2,  2,  - 2)  + c3(l,  - 2,  1)  = ( - 6,  12, 4) 

29.  Let  ui  = ( — 1,  3,  2,  0),  u2  = (2,  0,  4,  — 1),  u3  = (7,  1,  1,4),  and  114=  (6,  3,  1,2).  Find  scalars  c\, 
c2,  c2,  and  C4  such  that  cjuj  -Fc2u2  + c3u3  +C4U4=  (0,  5,  6,  —3). 


Answer: 

Cl  = 1,  c2=  1,  c3=  - 1,  c4=  1 

30.  Show  that  there  do  not  exist  scalars  ci,  c2,  and  c3  such  that 

ci(l,  0,  1,  0)  +c2(l,  0,  - 2,  1)  +c3(2,  0,  1,  2)  = (1,  - 2,  2,  3) 

31.  Show  that  there  do  not  exist  scalars  c\,c2,  and  c3  such  that 

ci ( - 2,  9,  6)  + c2(  - 3,  2, 1)  + c3(l,  7,  5)  = (0,  5, 4) 

32.  Consider  Figure  3.1.12.  Discuss  a geometric  interpretation  of  the  vector 

u = dFl+j(dp2~dpl'j 

33.  Let  P be  the  point  (2,  3,  — 2)  and  Q the  point  (7,  — 4,  1 ) . 

(a)  Find  the  midpoint  of  the  line  segment  connecting  P and  Q. 

(b)  Find  the  point  on  the  line  segment  connecting  P and  Q that  is  ^ of  the  way  from  P to  Q. 


Answer: 


<a)  (§, 

(23 

U ’ 


(b) 


-1  -I) 

2’  2J 


34.  Let  P be  the  point  (1,  3,  7) . If  the  point  (4,  0,  — 6)  is  the  midpoint  of  the  line  segment  connecting  P and 
Q,  what  is  Ql 

35.  Prove  parts  ( a ),  (c),  and  ( d)  of  Theorem  3.1.1. 

36.  Prove  parts  {e)-{h)  of  Theorem  3.1.1. 

37.  Prove  parts  (a)-(c)  of  Theorem  3.1.2. 

True-False  Exercises 

In  parts  (a)-(k)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  Two  equivalent  vectors  must  have  the  same  initial  point. 

Answer: 

False 

(b)  The  vectors  (a,  b)  and  (a,  b,  0)  are  equivalent. 

Answer: 

False 

(c)  If  A:  is  a scalar  and  v is  a vector,  then  v and  kv  are  parallel  if  and  only  if  k > 0- 
Answer: 

False 

(d)  The  vectors  v 4-  (u  4-  w)  and  (w  + v)  4-  u are  the  same. 

Answer: 

True 

(e)  If  u 4.  v = u + w,  then  v = w. 

Answer: 

True 

(f)  If  a and  b are  scalars  such  that  au  | bv  = 0,  then  u and  v are  parallel  vectors. 

Answer: 

False 

(g)  Collinear  vectors  with  the  same  length  are  equal. 

Answer: 

False 

(h)  If  (a,  b,  c ) 4-  0,  y,  z ) = (x,  y,  z),  then  (a,  b,  c ) must  be  the  zero  vector. 


Answer: 


True 

(i)  If  k and  m are  scalars  and  u and  v are  vectors,  then 

(k  4-  m)  (u  + v)  = £u  + mv 


Answer: 

False 

(j)  If  the  vectors  v and  w are  given,  then  the  vector  equation 

3(2v-x)  = 5x  — 4w+v 

can  be  solved  for  x. 

Answer: 

True 

(k)  The  linear  combinations  a jvj  4-  &2V2  and  + &2V2  can  only  be  equal  if  a\  — b\  and  aj  = bj- 
Answer: 

False 
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3-2  Norm,  Dot  Product,  and  Distance  in  Rn 

In  this  section  we  will  be  concerned  with  the  notions  of  length  and  distance  as  they  relate  to  vectors.  We  will 
first  discuss  these  ideas  in  R 2 and  R-'  and  then  extend  them  algebraically  to  Rn. 


Norm  of  a Vector 

In  this  text  we  will  denote  the  length  of  a vector  v by  the  symbol  ||  v|| , which  is  read  as  the  norm  of  v,  the 
length  of  v,  or  the  magnitude  of  v (the  term  “norm”  being  a common  mathematical  synonym  for  length).  As 
suggested  in  Figure  3.2.1a,  it  follows  from  the  Theorem  of  Pythagoras  that  the  norm  of  a vector  (vj,  V2)  in  r} 
is 


IMI  = /vf+vf  (1) 

Similarly,  for  a vector  (vj,  V2,  V3)  in  /?-',  it  follows  from  Figure  3.2.16  and  two  applications  of  the  Theorem  of 
Pythagoras  that 

||v||2  = (OR)2  + (RP)2  = (00  2 + ( QR )2  + (RP)2  = vj  + v|  + v] 

and  hence  that 

IMI  = /v2  + v2  + v2  (2) 

Motivated  by  the  pattern  of  Formulas  1 and  2 we  make  the  following  definition. 


DEFINITION  1 

If  v = (vi,  V2, ....  vM)  is  a vector  in  Rn,  then  the  norm  of  v (also  called  the  length  of  v or  the 
magnitude  of  v)  is  denoted  by  ||v||,  and  is  defined  by  the  formula 

IMI  = /v2  + v2  + v2  + ...  + v^  (3) 


EXAMPLE  1 Calculating  Norms 

It  follows  from  Formula  2 that  the  norm  of  the  vector  v = ( — 3,  2,  1)  in  /?-':  is 

IMI  = ^ (— 3)2  + 22  + l2  = /l4 

and  it  follows  from  Formula  3 that  the  norm  of  the  vector  v=(2,  — 1,3,  — 5)  in/?4  is 

IMI  = /22  + (-l)2*32  + (-5)2  = {39 


INI 


A>’ 


(«) 


(|>1«  v2) 


"2 


.r 


Figure  3.2.1 


Our  first  theorem  in  this  section  will  generalize  to  Rn  the  following  three  familiar  facts  about  vectors  in  g}  and 


R 


3. 


Distances  are  nonnegative. 

The  zero  vector  is  the  only  vector  of  length  zero. 

Multiplying  a vector  by  a scalar  multiplies  its  length  by  the  absolute  value  of  that  scalar. 

It  is  important  to  recognize  that  just  because  these  results  hold  in  Rz  and  does  not  guarantee  that  they  hold 
in  Rn — their  validity  in  Rn  must  be  proved  using  algebraic  properties  of  n-tuples. 


THEOREM  3.2.1 

If  v is  a vector  in  Rn,  and  if  k is  any  scalar,  then: 

(a)  l|v||  > 0 

(b)  1 1 v 1 1 = 0 if  and  only  if  y = Q 

(c)  IMI  = |*|IMI 

We  will  prove  part  (c)  and  leave  ( a ) and  ( b ) as  exercises. 

(c)  If  v = (vi,V2 v„),then£v=  (kvi,  kv2, ....  kv„),  so 


Unit  Vectors 


IIMI  = /(*vi)2  + (*v2)2+  • • • +(*v«)2 

= /(^2)(v2  + v|+  • • • +v2) 

= l^v2 + v|+  • • • -F  v2 

= |*|IMI 


A vector  of  norm  1 is  called  a unit  vector.  Such  vectors  are  useful  for  specifying  a direction  when  length  is  not 
relevant  to  the  problem  at  hand.  You  can  obtain  a unit  vector  in  a desired  direction  by  choosing  any  nonzero 
vector  v in  that  direction  and  multiplying  v by  the  reciprocal  of  its  length.  For  example,  if  v is  a vector  of 


length  2 in  p}  or  p^,  then  -i-v  is  a unit  vector  in  the  same  direction  as  v.  More  generally,  if  v is  any  nonzero 


vector  in  Rn,  then 


defines  a unit  vector  that  is  in  the  same  direction  as  v.  We  can  confirm  that  4 is  a unit  vector  by  applying  part 
(c)  of  Theorem  3.2.1  with  k=  \ ! ||v||  to  obtain 

Hull  = IIMI  = |*|l|v||  =*IMI  = -jj^jflMI  = l 

The  process  of  multiplying  a nonzero  vector  by  the  reciprocal  of  its  length  to  obtain  a unit  vector  is  called 
normalizing  v. 

WARNING 


Sometimes  you  will  see  Formula  4 expressed  as 


This  is  just  a more  compact  way  of  writing  that 
formula  and  is  not  intended  to  convey  that  v is 
being  divided  by  ||  v|| . 


EXAMPLE  2 Normalizing  a Vector 

Find  the  unit  vector  u that  has  the  same  direction  as  v = (2,  2,  — 1) . 
The  vector  v has  length 

IMI  = ^22  + 22  + ( — 1)2  = 3 


u = y(2,  2, 


1 

3 


) 


Thus,  from  4 


As  a check,  you  may  want  to  confirm  that  ||u||  = 1 . 


The  Standard  Unit  Vectors 


When  a rectangular  coordinate  system  is  introduced  in  R2  or  R-',  the  unit  vectors  in  the  positive  directions  of 
the  coordinate  axes  are  called  the  standard  unit  vectors.  In  pp  these  vectors  are  denoted  by 

i=  (1, 0)  and  j=(0,l) 


and  in  pp  by 


i=  (1.0.0).  j=  (0.1.0),  and  k=  (0.0,1) 

(Figure  3.2.2).  Every  vector  v = (vj,  V2)  in  and  every  vector  v = (v[,  V2,  V3)  in  p~'  can  be  expressed  as  a 
linear  combination  of  standard  unit  vectors  by  writing 


v=  Oi,  v2>  =vi(l,  0)  +V2(0,  1)  =vii  + V2j 


(5) 


v = (v1.v2.v3)  =vi(l,  0,  0)  4-  v2(0,  1,  0)  +V3(0,  0,  1)  =V!i  + V2]  + V3k  (6) 

Moreover,  we  can  generalize  these  formulas  to  Rn  by  defining  the  standard  unit  vectors  in  Rn  to  be 

ei  = (1,  0,  0. ....  0),  e2  = (0.  1.  0 0) e„  = (0.  0,  0. ....  1)  (7) 

in  which  case  every  vector  v=(vi,V2,...,  vM)  in  Rn  can  be  expressed  as 

v=(vi,v2 v„)  = vje!  + v2e2  + ...  + vKe„  (8) 

EXAMPLE  3 Linear  Combinations  of  Standard  Unit  Vectors 

(2,  — 3,  4)  = 2i  — 3j  + 4k 

(7,  3,  — 4,  5)  = 7ei  + 3e2  — 4e3  4-  5e4 
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(0,0.1) 
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I (0.1.0) 

r/  (1.0,0) 

V 

(b) 

Figure  3.2.2 
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Distance  in  Rn 

If  P\  and  P'i  are  points  in  or  then  the  length  of  the  vector  p^p\  is  equal  to  the  distance  d between  the 
two  points  (Figure  3.2.3).  Specifically,  if  Pj  (x \ , y \ ) and  P'i(p2>  >r ? ) are  points  in  fip,  then  Formula  4 of 
Section  3.1  implies  that 

i = WPJi  II  = |/02-*i)2  + 02-;'i)2  <9> 

This  is  the  familiar  distance  formula  from  analytic  geometry.  Similarly,  the  distance  between  the  points 
Pl(*l,  y\,z\)  andP2(*2>y2,Z2)  in  3-space  is 

d ( u , v)  = H^i^ll  = /(x2-xi)2  + 0;2-3;l)2  + (z2“zl)2  (10) 

Motivated  by  Formulas  9 and  10,  we  make  the  following  definition. 


DEFINITION  2 

If  u = (u\,  U2 , - tin)  and  v = (vi,  V2, v„)  are  points  in  Rn,  then  we  denote  the  distance  between 
u and  v by  d (u,  v)  and  define  it  to  be 


+ • • • + (tin  “ vm) 


d(u,  v)  = ||u- v||  = ^(«i  -Vl)2+  («2-v2)2 


(11) 


4= Ml 


Figure  3.2.3 


We  noted  in  the  previous  section  that  n-tuples 
can  be  viewed  either  as  vectors  or  points  in  Rn. 
In  Definition  2 we  chose  to  describe  them  as 
points,  as  that  seemed  the  more  natural 
interpretation. 


EXAMPLE  4 Calculating  Distance  in  Rn 

if 

u=  (1,  3,  — 2,  7)  and  v=(0, 7,2,2) 
then  the  distance  between  u and  v is 

d( u,  v)  = /(I  - 0)2  + (3  - 7)2  + (-2  - 2)2  + (7  - 2)2  = ^58 


Dot  Product 

Our  next  objective  is  to  define  a useful  multiplication  operation  on  vectors  in  p}  and  p-'  and  then  extend  that 
operation  to  Pn.  To  do  this  we  will  first  need  to  define  exactly  what  we  mean  by  the  “angle”  between  two 
vectors  in  p}  or  p-'.  For  this  purpose,  let  u and  v be  nonzero  vectors  in  p/  or  p-’  that  have  been  positioned  so 
that  their  initial  points  coincide.  We  define  the  angle  between  u and  v to  be  the  angle  0 determined  by  u and  v 
that  satisfies  the  inequalities  0 < 0 < ir  (Figure  3.2.4). 


DEFINITION  3 

If  u and  v are  nonzero  vectors  in  p}  or  p*,  and  if  0 is  the  angle  between  u and  v,  then  the  dot  product 
(also  called  the  Euclidean  inner  product ) of  u and  v is  denoted  by  u • v and  is  defined  as 

u • v=  ||u||||v||cos0  (12) 


If  u = 0 or  v = 0>  then  we  define  u • v to  be  0. 


u 


II 


6 


v 


V 


The  angle  0 between  u and  v satisfies  0 < 0 < tt. 


Figure  3.2.4 


The  sign  of  the  dot  product  reveals  information  about  the  angle  0 that  we  can  obtain  by  rewriting  Formula  12 
as 


cos  9 = 


IMIIMI 


(13) 


Since  0 < 0 < ir,  it  follows  from  Formula  13  and  properties  of  the  cosine  function  studied  in  trigonometry  that 

• 9 is  acute  if  u • v > 0- 

• S is  obtuse  if  u • v < 0- 

• 9 = tr/  2 ifu  • v = 0- 


EXAMPLE  5 Dot  Product 


Find  the  dot  product  of  the  vectors  shown  in  Figure  3.2.5. 

a* 


(0.  2,  2) 

V 


(O.O,  I) 


Figure  3.2.5 


The  lengths  of  the  vectors  are 

Nil  = 1 and  ||v||  = /8  = 2/2 
and  the  cosine  of  the  angle  0 between  them  is 

cos  ^45°  J = 1/^2 

Thus,  it  follows  from  Formula  12  that 


u • v = Hull  IMIcos  9 = (1)  (2/2)  (1  / ft)  = 2 


EXAMPLE  6 A Geometry  Problem  Solved  Using  Dot  Product 


Find  the  angle  between  a diagonal  of  a cube  and  one  of  its  edges. 


Let  A:  be  the  length  of  an  edge  and  introduce  a coordinate  system  as  shown  in  Figure  3.2.6. 
If  we  let  ui  = (£,  0,  0),  U2  = (0,  k,  0),  and  U3  = (0,  0,  k),  then  the  vector 

d = (£,  k,  £)  =m  +U2  + U3 

is  a diagonal  of  the  cube.  It  follows  from  Formula  13  that  the  angle  0 between  d and  the  edge  uj 
satisfies 

a UI  • d k1  _L 

llu.lllldll  ft 

With  the  help  of  a calculator  we  obtain 

9=  cos-1  54.74° 

t (O,  0.  At) 

(k,  k.  k) 

u,  y 

► 

<0.  k.  0) 

xjf  (it,  0,0) 

Figure  3.2.6 


Note  that  the  angle  0 obtained  in  Example  6 
does  not  involve  k.  Why  was  this  to  be 
expected? 


Component  Form  of  the  Dot  Product 

For  computational  purposes  it  is  desirable  to  have  a formula  that  expresses  the  dot  product  of  two  vectors  in 
terms  of  components.  We  will  derive  such  a formula  for  vectors  in  3-space;  the  derivation  for  vectors  in 
2-space  is  similar. 


Let  u = (u\,  U2,  ^3)  and  v = (y\,  V2,  V3)  be  two  nonzero  vectors.  If,  as  shown  in  Figure  3.2.7,  0 is  the  angle 
between  u and  v,  then  the  law  of  cosines  yields 


The  dot  product  notation  was  first  introduced  by  the  American  physicist  and 
mathematician  J.  Willard  Gibbs  in  a pamphlet  distributed  to  his  students  at  Yale  University  in  the 
1880s.  The  product  was  originally  written  on  the  baseline,  rather  than  centered  as  today,  and  was 
referred  to  as  the  direct  product.  Gibbs’s  pamphlet  was  eventually  incorporated  into  a book  entitled 
Vector  Analysis  that  was  published  in  1901  and  coauthored  with  one  of  his  students.  Gibbs  made  major 
contributions  to  the  fields  of  thermodynamics  and  electromagnetic  theory  and  is  generally  regarded  as 
the  greatest  American  physicist  of  the  nineteenth  century. 

[Image:  The  Granger  Collection,  New  York] 

Since  PQ  = v-  u,  we  can  rewrite  14  as 


(14) 


Josiah  Willard  Gibbs  (1839-1903) 


Hull IMIcos  0=1  (||u||2  + ||v||2  - ||v - u||2) 


or 


Substituting 


Ml2  = «?+«£ + 4 l|v||2  = vj  -h  vj  + V3 


and 


l|v-u||2=  (vi  -ai)2  + (v2-«2)2  + (V3“«3)2 


we  obtain,  after  simplifying, 


u - v = + «2V2  4-  &3V3 


(15) 


Although  we  derived  Formula  15  and  its 
2-space  companion  under  the  assumption  that  u 
and  v are  nonzero,  it  turned  out  that  these 
formulas  are  also  applicable  if  u = 0 or  v = 0 
(verify). 

The  companion  formula  for  vectors  in  2-space  is 

u- v = «ivi  4- &2v2  (16) 

Motivated  by  the  pattern  in  Formulas  15  and  16,  we  make  the  following  definition. 


DEFINITION  4 

If  u = (u\,  U2 , - un)  and  v = (v\,  V2, vM)  are  vectors  in  then  the  dot  product  (also  called  the 

Euclidean  inner  product)  of  u and  v is  denoted  by  u - y and  is  defined  by 

U- v = «ivi  +U2V2+-..  + unvn  (17) 


In  words,  to  calculate  the  dot  product 
(Euclidean  inner  product)  multiply 
corresponding  components  and  add  the 
resulting  products. 


EXAMPLE  7 Calculating  Dot  Products  Using  Components 

(a)  Use  Formula  15  to  compute  the  dot  product  of  the  vectors  u and  v in  Example  5. 
Calculate  u ■ v f°r  the  following  vectors  in 

u=  ( — 1,  3,  5, 7),  v=(-3,  -4,1,0) 


Solution 

) The  component  forms  of  the  vectors  are  u = (0,  0,  1)  and  v = (0,  2,  2).  Thus, 

u • v=  (0)  (0)  + (0)  (2)  + (1)  (2)  = 2 
which  agrees  with  the  result  obtained  geometrically  in  Example  5. 

«■▼=(  — D(  — 3)  + (3)(  — 4)  + (5)(l)  + (7)(0)=  — 4 


(b) 


f\u{.  Uy  U}) 


U 


V 


CA<Vr2’  f,3) 


6 


y 


x 


/ 


Figure  3.2.7 


Algebraic  Properties  of  the  Dot  Product 


In  the  special  case  where  u = v in  Definition  4,  we  obtain  the  relationship 


This  yields  the  following  formula  for  expressing  the  length  of  a vector  in  terms  of  a dot  product: 


Dot  products  have  many  of  the  same  algebraic  properties  as  products  of  real  numbers. 

THEOREM  3.2.2 


(a)  u • v = v ■ u [ Symmetry  property] 

(b)  u • (v  + w)  = u • v + u • w [Distibutive  property] 

(c)  k(u  • v)  = (ku)  • v [Homogeneity  property] 

(d)  v • v > 0 and  v • v = 0 if  and  only  if  v = 0 [Positivity  property] 

We  will  prove  parts  (c)  and  ( d)  and  leave  the  other  proofs  as  exercises. 

(c)  Let u = (u\,U2,...,un)  and v = (vi,V2 v„).Then 


V • v = Vj  +vj  + ...  + = ||v||2 


(18) 


(19) 


If  u,  v,  and  w are  vectors  in  Rn,  and  if  k is  a scalar,  then: 


£(u-v)  =£(aivi  +W2V2  + .~  + «„v„) 

= (kui)vi  + (ku2>2  + ...+  (ku„)v„  = (An)  • v 


Proof  (d)  The  result  follows  from  parts  (a)  and  ( b ) of  Theorem  3.2.1  and  the  fact  that 


2 2 2 2 

v • v = vivi  +V2V2  + ...  + vMv„  = Vj  +V2  + ...  + vM  = ||v|| 


The  next  theorem  gives  additional  properties  of  dot  products.  The  proofs  can  be  obtained  either  by  expressing 
the  vectors  in  terms  of  components  or  by  using  the  algebraic  properties  established  in  Theorem  3.2.2. 


THEOREM  3.2.3 

Ifu,  v,  and  w are  vectors  in  Rn,  and  if  A:  is  a scalar,  then: 


(a) 

0 • v = v • 0 = 

= 0 

(b) 

C 

+ 

ii 

= u ■ 

W+ V 

• w 

(c) 

u • (v  — w)  = 

= u • 

v — u • 

w 

(d) 

V 

1 

* 

II 

= u • 

w — V 

■ w 

(e) 

fc(u  • v)  = u 

■ (Av) 

We  will  show  how  Theorem  3.2.2  can  be  used  to  prove  part  ( b ) without  breaking  the  vectors  into  components. 
The  other  proofs  are  left  as  exercises. 

Proof (b) 


(u  + v)  -w 


= v (u  + v) 
= w • u + w-  V 
= u • w + v • w 


[By  symmetry ] 
[By  distributivity] 
[By  symmetry] 


Formulas  18  and  19  together  with  Theorems  3.2.2  and  3.2.3  make  it  possible  to  manipulate  expressions 
involving  dot  products  using  familiar  algebraic  techniques. 

EXAMPLE  8 Calculating  with  Dot  Products 

(u  — 2v)  • (3u  + 4v)  = u • (3u  + 4v)  — 2v  • (3u  + 4v) 

= 3(u  • u)  +4(u  • v)  — 6(v  • u)  — 8(v  • v) 

= 3||u||2  — 2(u  • v)  — 8 ||v|| 2 


Cauchy — Schwarz  Inequality  and  Angles  in  Rn 


Our  next  objective  is  to  extend  to  Rn  the  notion  of  “angle”  between  nonzero  vectors  u and  v.  We  will  do  this 
by  starting  with  the  formula 


(20) 


9 = cos  1 f n U.|‘||V  n ) 

V IMIIMI  / 

which  we  previously  derived  for  nonzero  vectors  in  r}  and  Rf.  Since  dot  products  and  norms  have  been 
defined  for  vectors  in  Rn,  it  would  seem  that  this  formula  has  all  the  ingredients  to  serve  as  a definition  of  the 
angle  0 between  two  vectors,  u and  v,  in  Rn.  However,  there  is  a fly  in  the  ointment,  the  problem  being  that  the 
inverse  cosine  in  Formula  20  is  not  defined  unless  its  argument  satisfies  the  inequalities 


-1  < 


IMIIMI 


< l 


(21) 


Fortunately,  these  inequalities  do  hold  for  all  nonzero  vectors  in  Rn  as  a result  of  the  following  fundamental 
result  known  as  the  Cauchy — Schwarz  inequality. 


Cauchy — Schwarz  Inequality 

Ifu  = (u\,  U2, ....  un)  and  v = (vj,  V2, vM)  are  vectors  in  Rn,  then 

|«-v|<  IMIIMI 


or  in  terms  of  components 


u\v  i +ti2V2  + -..  + u„v„ 


< 


j + «2  + + 


1/2 


(V1  + v2 


+ ---  + V; 


") 


1/2 


(22) 


(23) 


We  will  omit  the  proof  of  this  theorem  because  later  in  the  text  we  will  prove  a more  general  version  of  which 
this  will  be  a special  case.  Our  goal  for  now  will  be  to  use  this  theorem  to  prove  that  the  inequalities  in  21  hold 
for  all  nonzero  vectors  in  R}\  Once  that  is  done  we  will  have  established  all  the  results  required  to  use  Formula 
20  as  our  definition  of  the  angle  between  nonzero  vectors  u and  v in  Rn. 


To  prove  that  the  inequalities  in  21  hold  for  all  nonzero  vectors  in  Rn,  divide  both  sides  of  Formula  22  by  the 
product  ||u||  ||v||  to  obtain 


i"  • vi 

IMIIMI 


<1 


or  equivalently 


u • v 

IMIIMI 


< 1 


from  which  21  follows. 


Hermann  Amandus  Schwarz  (1843-1921) 


Viktor  Yakovlevich  Bunyakovsky  (1804-1889) 


The  Cauchy — Schwarz  inequality  is  named  in  honor  of  the  French  mathematician 
Augustin  Cauchy  (see  p.  109)  and  the  German  mathematician  Hermann  Schwarz.  Variations  of  this 
inequality  occur  in  many  different  settings  and  under  various  names.  Depending  on  the  context  in 
which  the  inequality  occurs,  you  may  find  it  called  Cauchy's  inequality,  the  Schwarz  inequality,  or 
sometimes  even  the  Bunyakovsky  inequality,  in  recognition  of  the  Russian  mathematician  who 
published  his  version  of  the  inequality  in  1859,  about  25  years  before  Schwarz. 

[Images:  wikipedia  (Schwarz);  wikipedia  (Bunyakovsky)] 


Geometry  in  Rn 

Earlier  in  this  section  we  extended  various  concepts  to  R}}  with  the  idea  that  familiar  results  that  we  can 
visualize  in  r}  and  r}  might  be  valid  in  Rn  as  well.  Here  are  two  fundamental  theorems  from  plane  geometry 
whose  validity  extends  to  Rn\ 

The  sum  of  the  lengths  of  two  side  of  a triangle  is  at  least  as  large  as  the  third  (Figure  3.2.8). 

The  shortest  distance  between  two  points  is  a straight  line  (Figure  3.2.9). 

The  following  theorem  generalizes  these  theorems  to  Rn. 


THEOREM  3.2.5 


If  u,  v,  and  w are  vectors  in  Rn,  and  if  k is  any  scalar,  then: 

(a)  llu  + vll  ^ INI  + IMI  [Triangle  inequality  for  vectors] 

(h)  d (u,  v)  < d (u,  w)  -I-  d (w,  v)  [Triangle  inequality  for  distances] 


Proof  (a) 


||u*v||2  = 


< 

< 


(u  + v)  • (u  + v)  = (u  • u)  + 2(u  • v)  + (v  • v) 
l|u||2  + 2(u  • v)  + ||v||2 

||u||2  + 2|u  • v|  + || v|| 2 «—  Property  of  absolute  value 

||u||2  + 2||u||||v||  + Ilvl|2  Cauchy  — Schwarz  inequality 

(INI  + INI)2 


Proof  (b)  It  follows  from  part  (a)  and  Formula  1 1 that 


af(u,v)  = ||u-v||  = ||(u-w)  + (w-v)|| 

< ||u  — w||  + ||w—  v||  =<af(u,  w)  +<af(w,  v) 


u 

llu  + v||  < ||u||  + ||v|| 


Figure  3.2.8 


V 


</(u.  v)  < d( ii.  w)  -f  4w,  v) 

Figure  3.2.9 

It  is  proved  in  plane  geometry  that  for  any  parallelogram  the  sum  of  the  squares  of  the  diagonals  is  equal  to  the 
sum  of  the  squares  of  the  four  sides  (Figure  3.2.10).  The  following  theorem  generalizes  that  result  to  R”. 

Parallelogram  Equation  for  Vectors 

If  u and  v are  vectors  in  Rn,  then 

llu  4-  V||2  + ||u  - v||2  = 2 (||u||2  + ||v||2)  (24) 


Proof 


l|u  + v||2+  ||u-v||2 


= (u  4-  v)  • (u  4-  v)  4=  (u  — v)  • (u  — v) 
= 2(u  • u)  + 2(v  • v) 

= 2(||u||2+||v||2) 


u 


Figure  3.2.10 


We  could  state  and  prove  many  more  theorems  from  plane  geometry  that  generalize  to  Rn,  but  the  ones  already 
given  should  suffice  to  convince  you  that  Rn  is  not  so  different  from  r}  and  R-'  even  though  we  cannot 
visualize  it  directly.  The  next  theorem  establishes  a fundamental  relationship  between  the  dot  product  and  norm 
in*”. 


THEOREM  3.2.7 


If  u and  v are  vectors  in  Rn  with  the  Euclidean  inner  product,  then 

u • V = ^||u  + vll2  - i||u  - v||2 


(25) 


Proof 

||u  + v||2  = (u  + v)  • (u  + v)  = Hull2  4-  2(u  • v)  + ||v||2 

||u  v || 2 = (u-v)  • (u-v)  = ||u|| 2 — 2(u- v)  + ||v||2 

from  which  25  follows  by  simple  algebra. 

Note  that  Formula  25  expresses  the  dot  product 
in  terms  of  norms. 


Dot  Products  as  Matrix  Multiplication 

There  are  various  ways  to  express  the  dot  product  of  vectors  using  matrix  notation.  The  formulas  depend  on 
whether  the  vectors  are  expressed  as  row  matrices  or  column  matrices.  Here  are  the  possibilities. 

If  A is  an  n x n matrix  and  u and  v are  1 matrices,  then  it  follows  from  the  first  row  in  Table  1 and 
properties  of  the  transpose  that 

An  • v = vr(Ai)  = = (21 7v) ' u = u • ATv 

u • Ax  — (j4v)  *'  u = (vV)u  = vr(^u)  = A^n  • v 

The  resulting  formulas 

.4u  ■ v = u • (26) 

u-j4v  = ^7'u-v  (27) 

provide  an  important  link  between  multiplication  by  an  « x « matrix  A and  multiplication  by  A J • 

EXAMPLE  9 Verifying  That  Au  v = u ■ ATm 


Suppose  that 


Then 


A = 


CO 

Csl 

I 

I 

-r 

-2' 

2 4 1 

, U = 

2 

, V = 

0 

-1  0 1 

4 

5 

from  which  we  obtain 


An  = 


ATr 


1 -2  3' 

r-r 

7' 

2 4 1 

2 

= 

10 

-1  0 1 

4 

5 

1 2 -1] 

'-2' 

'-7' 

-2  4 0 

0 

= 

4 

3 1 1 

5 

-1 

Auv  =7(- 2) + 10(0) + 5(5)  = 11 
u-Arv  = ( — 1)(  — 7)  + 2(4)  + 4(  — 1)  = 11 

Thus,  Ai  • v = u • A 1 v as  guaranteed  by  Formula  26.  We  leave  it  for  you  to  verify  that  Formula 
27  also  holds. 


Table  1 


Form 


Dot  Product 


Example 


u a column  matrix  and 
v a column  matrix 


T T 

u • V = u v = v u 


u = 


V = 


1 

-3 

5 

5 
4 
0 


u'v=[l  -3  5] 


v'u=[5  4 0] 


5 

4 
0 

1 

-3 

5 


u a row  matrix  and  v a 
column  matrix 


T T 

u • V = UV  = V u 


U=[l 

'5 

v=  4 
0 


-3  5] 


uv=[l  -3  5] 


vTxiT=[5  4 0] 


5 

4 

0 

1 

-3 

5 


u a column  matrix  and 
v a row  matrix 


T T 

u • v = vu  = u V 


u = 


r 

f 

1 

1 

1 

vu=  [5  4 0] 

1 

[ 

U1  LkJ 
1 

v=  [5  4 0] 


uV=[l  -3  5] 


= -7 


Form 


Dot  Product  Example 


u a row  matrix  and  v a 

T T 

u • V = uv  = vu 

u=[l  -3  5] 

'5' 

row  matrix 

v = [5  4 0] 

uv  = [ 1 -3  5] 

4 

= -7 

0 

f 

vur=  [5  4 0] 

-3 

= -7 

5 

A Dot  Product  View  of  Matrix  Multiplication 


Dot  products  provide  another  way  of  thinking  about  matrix  multiplication.  Recall  that  if  A = [ay  ] is  an  m x r 
matrix  and  B = [by  ] is  an  r x n matrix,  then  the  i Jth  entry  of  AB  is 

+ — + «ir^rj 

which  is  the  dot  product  of  the  /th  row  vector  of  A 

[<3jl  i3j2  ...  <%] 


and  the  /th  column  vector  of  B 


*1  / 
*2j 


Thus,  if  the  row  vectors  of  A are  r i , r2, . . rm  and  the  column  vectors  of  B are  c \ , C2, . . tn,  then  the  matrix 
product  AB  can  be  expressed  as 


rl  ' 

ci 

rl  ■ 

' c2 

AB  = 

r2  ' 

Cl 

*2  ' 

' c2 

• Cl 

• c2 

rl  • c„ 
r2  • 

rm  ‘ c« 


(28) 


Application  of  Dot  Products  to  ISBN  Numbers 

Although  the  system  has  recently  changed,  most  books  published  in  the  last  25  years  have  been 
assigned  a unique  10-digit  number  called  an  International  Standard  Book  Number  or  ISBN.  The  first 
nine  digits  of  this  number  are  split  into  three  groups — the  first  group  representing  the  country  or  group 
of  countries  in  which  the  book  originates,  the  second  identifying  the  publisher,  and  the  third  assigned  to 
the  book  title  itself.  The  tenth  and  final  digit,  called  a check  digit , is  computed  from  the  first  nine  digits 
and  is  used  to  ensure  that  an  electronic  transmission  of  the  ISBN,  say  over  the  Internet,  occurs  without 
error. 

To  explain  how  this  is  done,  regard  the  first  nine  digits  of  the  ISBN  as  a vector  b in  and  let  a be  the 


vector 


a=(1.2,  3,4,  5,  6, 7,  8,  9) 

Then  the  check  digit  c is  computed  using  the  following  procedure: 

Form  the  dot  product  a ■ b- 

Divide  a • b by  11,  thereby  producing  a remainder  c that  is  an  integer  between  0 and  10,  inclusive. 
The  check  digit  is  taken  to  be  c,  with  the  proviso  that  c = 10  is  written  as  X to  avoid  double  digits. 

For  example,  the  ISBN  of  the  brief  edition  of  Calculus , sixth  edition,  by  Howard  Anton  is 

0 — 471  — 15307  — 9 

which  has  a check  digit  of  9.  This  is  consistent  with  the  first  nine  digits  of  the  ISBN,  since 
a • b = (1,  2,  3, 4,  5,  6, 1,  8,  9)  • (0, 4, 1,  1,  1.  5,  3,  0, 7)  = 152 

Dividing  152  by  11  produces  a quotient  of  13  and  a remainder  of  9,  so  the  check  digit  is  ^ = 9-  If  an 
electronic  order  is  placed  for  a book  with  a certain  ISBN,  then  the  warehouse  can  use  the  above 
procedure  to  verify  that  the  check  digit  is  consistent  with  the  first  nine  digits,  thereby  reducing  the 
possibility  of  a costly  shipping  error. 


Concept  Review 

Norm  (or  length  or  magnitude)  of  a vector 

Unit  vector 

Normalized  vector 

Standard  unit  vectors 

Distance  between  points  in  Rn 

Angle  between  two  vectors  in  Rn 

Dot  product  (or  Euclidean  inner  product)  of  two  vectors  in  Rn 
Cauchy-Schwarz  inequality 
Triangle  inequality 
Parallelogram  equation  for  vectors 

Skills 

Compute  the  norm  of  a vector  in  Rn. 

Determine  whether  a given  vector  in  Rn  is  a unit  vector. 

Normalize  a nonzero  vector  in  Rn. 

Determine  the  distance  between  two  vectors  in  Rn. 

Compute  the  dot  product  of  two  vectors  in  Rn. 

Compute  the  angle  between  two  nonzero  vectors  in  Rn. 

Prove  basic  properties  pertaining  to  norms  and  dot  products  (Theorems  3.2. 1-3.2. 3 and  3.2. 5-3. 2. 7). 


Exercise  Set  3.2 


In  Exercises  1-2,  find  the  norm  of  v,  a unit  vector  that  has  the  same  direction  as  v,  and  a unit  vector  that  is 
oppositely  directed  to  v. 


2*  (a)  v=(  — 5,12) 

(b)  v=(l,  -1,2) 

(c)  v=(  — 2,3,3,  -1) 

In  Exercises  3-4,  evaluate  the  given  expression  with  u = (2,  — 2,  3),  v = (1,  — 3,  4),  and 
w=  (3,  6,  -4). 

3-(a)  llu  + vll 

(b)  ll^ll  + IMI 

(c)  ||  — 2u  + 2v|| 

(d)  ||3u-5v  + w|| 


(a)  ||u  + v||  = {S3 

(b)  ||u||  + ||v||  = /l7  + /26 

(c)  ||-2u  + 2v||  = 2/3 

(d)  ||  — 3u  — 5v  + w||  = ^466 

4*  (a)  I|u  + v4  w|| 

(b)  llu-vll 

(c)  H3v||-3||v|| 


L v = (4,  - 3) 

(b)  v = (2,  2,  2) 

(c)  v=  (1,0,  2,  1,3) 


Answer: 


Answer: 


(d)  INI  - IMI 

In  Exercises  5-6,  evaluate  the  given  expression  with  u = ( — 2,  — 1,  4,  5),  v = (3,  1,  — 5,  7),  and 

w=  ( — 6,  2,  1,  1) 

5*  (a)  ||3u-5v  + w|| 

(b)  l|3u||-5||v||  + ||w|| 

(c)  II  “ IMMI 

Answer: 

(a)  ||  3u  - 5v  4 w||  = / 2570 

(b)  || 3u||  - 5||v||  4 ||w||  = 3/46  - IO/2T 4 /42 

(c)  ||  - ||u||v||  = 2/966 

6- (a)  llull  — 2||v||  — 3||w|| 

(b)  INI  + II  - 2v||  4 ||  - 3w|| 

(c)  ||  ||u  — v||w|| 

7.  Let  v = ( — 2,  3,  0,  6) . Find  all  scalars  k such  that  ||£v||  = 5. 


Answer: 


8.  Let  v = (1,  1,  2,  — 3,  1).  Find  all  scalars  k such  that  ||£v||  = 4. 
In  Exercises  9-10,  find  u • v,  u • u,  and  v • v- 

9*  (a)  u=(3,  1,4),  v=  (2,  2,  -4) 

(b)  u=(l,  1,4,6), v=(2,  -2,3,  -2) 

Answer: 

(a)  u • v = — 8,  u • u = 26,  v • v = 24 
u • v = 0,  u • u = 54,  v • v = 2 1 

10-(a)  u=(l,  1,  -2,3),  v=(  — 1,0,5,  1) 

(b)  u=  (2,  -1,1,0,  -2),  v=  (1,2,  2,  2,1) 

In  Exercises  11-12,  find  the  Euclidean  distance  between  u and  v. 

n*(a)  u=  (3,  3,  3),  v = (1,  0, 4) 

(b)  u=  (0,  -2,  - 1,  1),  v = (-3,  2,4,4) 

(c)  u=  (3,  -3,  -2,0,  -3,13,5), 

▼ =(  — 4,1.  -1,5,  0,  -11,4) 


Answer: 


(a)  1 1 u - v||  = {\4 

(b)  ||u  - v||  = {59 

(c)  ||u-v||  = /677 

12-(a)  u=  (1,  2,  -3,0),  v=  (5,  1,  2,  -2) 

(b)  u=  (2,  -1,  -4,  1,0,6,  -3,1), 
v = ( — 2,  -1,0,3, 7,2,  -5,1) 

(c)  u=  (0,  1,  1,  1,  2),  v=  (2,  1,  0,  - 1,  3) 

13.  Find  the  cosine  of  the  angle  between  the  vectors  in  each  part  of  Exercise  11,  and  then  state  whether  the 
angle  is  acute,  obtuse,  or  90°. 

Answer: 

(a)  cos  0 = -J=J=  . 0 js  acute 

03)  cos  0—  . Q 0btuse 

<C)  cos9=_  {Mfm  ; 9 is ob,use 

14.  Find  the  cosine  of  the  angle  between  the  vectors  in  each  part  of  Exercise  12,  and  then  state  whether  the 
angle  is  acute,  obtuse,  or  90°. 

15.  Suppose  that  a vector  a in  the  xy-plane  has  a length  of  9 units  and  points  in  a direction  that  is  120° 
counterclockwise  from  the  positive  x-axis,  and  a vector  b in  that  plane  has  a length  of  5 units  and  points  in 
the  positive  ^-direction.  Find  a ■ b- 

Answer: 

a-b=45-^ 

16.  Suppose  that  a vector  a in  the  xy-plane  points  in  a direction  that  is  47°  counterclockwise  from  the  positive 
x-axis,  and  a vector  b in  that  plane  points  in  a direction  that  is  43°  clockwise  from  the  positive  x-axis.  What 
can  you  say  about  the  value  of  a • b? 

In  Exercises  17-18,  determine  whether  the  expression  makes  sense  mathematically.  If  not,  explain  why. 

17 • (a)  u • (v  • w) 

(b)  u • (v  4 w) 

(c)  llu'vll 

(d)  (**▼)-  Hull 


Answer: 


is  a scalar. 


(a)  u • (v  • w)  does  not  make  sense  because  v ■ w 

(b)  u • (v  4 w)  makes  sense. 

(c)  ||u-v||  does  not  make  sense  because  the  quantity  inside  the  norm  is  a scalar. 

(d)  (u  ■ v)  1 1 u 1 1 makes  sense  since  the  terms  are  both  scalars. 

18-  (a)  IMI  • IMI 

(b)  (u-v)-w 

(c)  (u  • v)  — £ 

(d)  * • u 

19.  Find  a unit  vector  that  has  the  same  direction  as  the  given  vector. 

(a)  (-4,-3) 

(b)  (1»7) 

(c)  (-  3,2,  ,^3) 

(d)  (1,2,  3, 4,  5) 


Answer: 


(a) 

(b) 

(c) 

(d) 


(4  4) 

_2  i £) 

4’  2’  4 

1 2 3 4 5 

{55’  {55’  {55’  i/55’  (55 1 


20.  Find  a unit  vector  that  is  oppositely  directed  to  the  given  vector. 

(a)  (-12,  -5) 

(b)  (3,  -3,-3) 

(c)  ( - 6,  8) 

(d)  (-3,  l,/6,3) 

21.  State  a procedure  for  finding  a vector  of  a specified  length  m that  points  in  the  same  direction  as  a given 
vector  v. 

22.  If  || v||  = 2 and  ||w||  = 3,  what  are  the  largest  and  smallest  values  possible  for  ||v  — w||?  Give  a geometric 
explanation  of  your  results. 

23.  Find  the  cosine  of  the  angle  0 between  u and  v. 

(a)  u = (2,  3),  v=  (5,  -7) 

(b)  u = (-6,  -2),  v=  (4,  0) 

(c)  u=(l,  -5,4),  v=  (3,  3,  3) 


(d)  u=  (-2,  2,  3),  v=  (1, 7,  —4) 


Answer: 

(a)  cos  9 = JJ— 

/ 962 

(b)  cos  9 = p=r 

/ 10 

(c)  cos  9 = 0 

(d)  cos  9 = 0 

24.  Find  the  radian  measure  of  the  angle  0 (with  0 < 9 < jt)  between  u and  v. 

(a)  (1,  -7)  and  (21,  3) 

(b)  (0,  2)  and  (3,  - 3) 

(c)  (-1,  1,0)  and  (0,  -1,1) 

(d)  (1,  -1,0)  and  (1,0,0) 

In  Exercises  25-26,  verify  that  the  Cauchy- Schwarz  inequality  holds. 

25-(a)  u=  (3,  2),  v = (4,  -1) 

(b)  u=(-3,l,0),  v=(2,  -1,3) 

(c)  u=  (0,2,  2,1),  v=  (1,1, 1,1) 

Answer: 

(a)  |u-v|  = 10,  INI  ||v||  = /T3/17«  14.866 

(b)  |u  • v|  = 7,  ||u||||v||  = 11.832 

(c)  |u  ' v|  = 5,  N|||v||  = (3)(2)  = 6 

26*(a)  u=(4,l,l),  v=  (1,  2,  3) 

(b)  u=  (1,2, 1,2,3),  v=  (0,1, 1,5,  -2) 

(c)  u=  (1,3,  5,  2.  0,1),  t=  (0,2,4, 1.3,  5) 

27.  Let  po  = (jq,  yQt  Zq)  and  p = (*,  z ) • Describe  the  set  of  all  points  ( x , y,  z ) for  which  ||p  - poll  = 1 • 
Answer: 

A sphere  of  radius  1 centered  at  (^0j  j^qj  zq)- 

2^-  (a)  Show  that  the  components  of  the  vector  v = (vj,  V2)  in  Figure  Ex-28a  are  vj  = ||v||cos  9 and 
V2  = ||  v||  sin  9. 

(b)  Let  u and  v be  the  vectors  in  Figure  Ex-286.  Use  the  result  in  part  (a)  to  find  the  components  of 
4u  - 5v- 


(«) 


(b) 


Figure  Ex-28 


29.  Prove  parts  (a)  and  ( b ) of  Theorem  3.2.1. 

30.  Prove  parts  (a)  and  (c)  of  Theorem  3.2.3. 

31.  Prove  parts  (d)  and  (e)  of  Theorem  3.2.3. 

32.  Under  what  conditions  will  the  triangle  inequality  (Theorem  3.2.5a)  be  an  equality?  Explain  your  answer 
geometrically. 

33.  What  can  you  say  about  two  nonzero  vectors,  u and  v,  that  satisfy  the  equation  ||u  4 v||  = ||u||  4 || v|| ? 

34*  (a)  What  relationship  must  hold  for  the  point  p = (a,  b,  c ) to  be  equidistant  from  the  origin  and  the 

xz-plane?  Make  sure  that  the  relationship  you  state  is  valid  for  positive  and  negative  values  of  a,  b,  and 

c. 

(b)  What  relationship  must  hold  for  the  point  p = (a,  b,  c)  to  be  farther  from  the  origin  than  from  the 
xz-plane?  Make  sure  that  the  relationship  you  state  is  valid  for  positive  and  negative  values  of  a,  b,  and 

c 


True-False  Exercises 


In  parts  (a)-(j)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  If  each  component  of  a vector  in  is  doubled,  the  norm  of  that  vector  is  doubled. 

Answer: 

True 

(b)  In  p},  the  vectors  of  norm  5 whose  initial  points  are  at  the  origin  have  terminal  points  lying  on  a circle  of 
radius  5 centered  at  the  origin. 

Answer: 

True 

(c)  Every  vector  in  Rn  has  a positive  norm. 

Answer: 


False 


(d)  If  v is  a nonzero  vector  in  Rn,  there  are  exactly  two  unit  vectors  that  are  parallel  to  v. 

Answer: 

True 

(e)  If  ||u||  = 2,  ||v||  = 1,  and  u • v = 1,  then  the  angle  between  u and  v is  % j 3 radians. 

Answer: 

True 

(f)  The  expressions  (u  ■ v)  4 w and  u • (v  4 w)  are  both  meaningful  and  equal  to  each  other. 
Answer: 

False 

(g)  If  u • v = u • w-  then  v = w. 

Answer: 

False 

(h)  If  u • v = 0>  then  either  u = 0 or  v = 0- 
Answer: 

False 

(i)  In  pi,  if  u lies  in  the  first  quadrant  and  v lies  in  the  third  quadrant,  then  u • v cannot  be  positive. 
Answer: 

True 

(j)  For  all  vectors  u,  v,  and  w in  Rn,  we  have 

||u  + v+w||  < ||u||  + ||v||  + ||w|| 

Answer: 

True 
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3.3  Orthogonality 

In  the  last  section  we  defined  the  notion  of  “angle”  between  vectors  in  Rn.  In  this  section  we  will  focus  on  the  notion  of 
“perpendicularity.”  Perpendicular  vectors  in  Rn  play  an  important  role  in  a wide  variety  of  applications. 


Orthogonal  Vectors 


Recall  from  Formula  20  in  the  previous  section  that  the  angle  0 between  two  nonzero  vectors  u and  v in  Rn  is  defned  by  the 
formula 

9=cos  *(  IMIIMI  ) 

It  follows  from  this  that  Q = r f 2 if  and  only  if  u ■ v = 0-  Thus,  we  make  the  following  definition. 


~i 


DEFINITION  1 

Two  nonzero  vectors  u and  v in  Rn  are  said  to  be  orthogonal  (or perpendicular)  if  u ■ v = 0*  We  will  also  agree  that  the 
zero  vector  in  Rn  is  orthogonal  to  every  vector  in  Rn.  A nonempty  set  of  vectors  in  Rn  is  called  an  orthogonal  set  if  all 
pairs  of  distinct  vectors  in  the  set  are  orthogonal.  An  orthogonal  set  of  unit  vectors  is  called  an  orthonormal  set. 

J 


EXAMPLE  1 Orthogonal  Vectors 

(a)  Show  that  u = ( — 2,  3,  1,4)  and  v = (1,  2,  0,  — 1)  are  orthogonal  vectors  in 

(b)  Show  that  the  set  S=  {i,  j,  k)  of  standard  unit  vectors  is  an  orthogonal  set  in 

Solution 

The  vectors  are  orthogonal  since 

u • v=  ( - 2)0)  + (3) (2)  + ( 1 ) (0)  + (4)(  - 1)  = 0 

(b)  We  must  show  that  all  pairs  of  distinct  vectors  are  orthogonal,  that  is, 

i- j = ik  = j-k  = 0 

This  is  evident  geometrically  (Figure  3.2.2),  but  it  can  be  seen  as  well  from  the  computations 

i • j=  (1,  0,  0)  • (0,  1,  0)  = 0 
i • k=  (1,  0,  0)  • (0,  0,  1)  = 0 
j • k = (0,  1,0)  • (0,  0,  1)  = 0 


In  Example  1 there  is  no  need  to  check  that 

j-i  = k-  i = k-  j=0 

since  this  follows  from  computations  in  the  example  and 
the  symmetry  property  of  the  dot  product. 


Lines  and  Planes  Determined  by  Points  and  Normals 


One  learns  in  analytic  geometry  that  a line  in  R 2 is  determined  uniquely  by  its  slope  and  one  of  its  points,  and  that  a plane  in  is 
determined  uniquely  by  its  “inclination”  and  one  of  its  points.  One  way  of  specifying  slope  and  inclination  is  to  use  a nonzero 
vector  n,  called  a normal , that  is  orthogonal  to  the  line  or  plane  in  question.  For  example,  Figure  3.3.1  shows  the  line  through  the 
point  PqCxq,  jyo ) ^as  normal n = ia>  4)  and  the  plane  through  the  point  Pq(*0>  y$,  zq)  that  has  normal  n = (a,  b,  c ) . Both 
the  line  and  the  plane  are  represented  by  the  vector  equation 

n-iV*=0  (1) 

where  P is  either  an  arbitrary  point  y)  on  the  line  or  an  arbitrary  point  y?  z)  in  the  plane.  The  vector  f\jp  can  be  expressed 
in  terms  of  components  as 

P^p  = (x-x0,  y-yo)  [line] 

Po?  = (x -xq,  y -yo,  z-zq)  [plane] 

a(x-xo)  •H-iO-j'o)  = 0 [line]  (2) 

a(x -xq) +b(y -yo) +c(z-zq)  = 0 [plane]  (3) 

These  are  called  the  point-normal  equations  of  the  line  and  plane. 

EXAMPLE  2 Point-Normal  Equations 

It  follows  from  2 that  in  p}  the  equation 

6(x-3)  + O + 7)  = 0 

represents  the  line  through  the  point  (3,  — 7)  with  normal  n = (6,  1);  and  it  follows  from  3 that  in  p?  the  equation 

4(x  — 3)  + 2y  — 5(z  — 7)  = 0 

represents  the  plane  through  the  point  (3,  0,7)  with  normal  n = (4,  2,  — 5) . 


z) 


(a,  b , c) 


n 


A 


P o(-*0*  >0*  -o> 


When  convenient,  the  terms  in  Equations  2 and  3 can  be  multiplied  out  and  the  constants  combined.  This  leads  to  the  following 
theorem. 


THEOREM  3.3.1 


(a)  If  a and  b are  constants  that  are  not  both  zero,  then  an  equation  of  the  form 


ax  -\-by  + c = 0 


(4) 


represents  a line  in  g}  with  normal  n = (a,  b) . 

(b)  If  a , b , and  c are  constants  that  are  not  all  zero,  then  an  equation  of  the  form 

ax-\-by  + + = 0 (5) 

represents  a plane  in  with  normal  n = (<s,  i>,  c) . 


EXAMPLE  3 Vectors  Orthogonal  to  Lines  and  Planes  Through  the  Origin 

The  equation  ax~\-by  = 0 represents  a line  through  the  origin  in  p}.  Show  that  the  vector  ni  = (a,  b)  formed 
from  the  coefficients  of  the  equation  is  orthogonal  to  the  line,  that  is,  orthogonal  to  every  vector  along  the  line. 
The  equation  ax~\-by  -hcz=  0 represents  a plane  through  the  origin  in  Show  that  the  vector  112  = (a,  b,  c) 
formed  from  the  coefficients  of  the  equation  is  orthogonal  to  the  plane,  that  is,  orthogonal  to  every  vector  that 
lies  in  the  plane. 


We  will  solve  both  problems  together.  The  two  equations  can  be  written  as 
(a,  b)  • ( x , y)  = 0 and  (a,  b,  c ) • ( x , y,  z)  = 0 


or,  alternatively,  as 


ni  • (x ,.y)  = 0 and  U2-  (x,y,z)  = 0 

These  equations  show  that  nj  is  orthogonal  to  every  vector  (x,  7)  on  the  line  and  that  112  is  orthogonal  to  every 
vector  (x,y,  z)  in  the  plane  (Figure  3.3.1). 


Recall  that 


ax  “h  by  = 0 and  ax  4-  by  + cz  = 0 

are  called  homogeneous  equations.  Example  3 illustrates  that  homogeneous  equations  in  two  or  three  unknowns  can  be  written  in 
the  vector  form 


n • x = 0 (6) 

where  n is  the  vector  of  coefficients  and  x is  the  vector  of  unknowns.  In  g?  this  is  called  the  vector  form  of  a line  through  the 
origin,  and  in  R 3 it  is  called  the  vector  form  of  a plane  through  the  origin. 

Referring  to  Table  1 of  Section  3.2,  in  what  other  ways 
can  you  write  6 if  n and  x are  expressed  in  matrix  form? 


Orthogonal  Projections 

In  many  applications  it  is  necessary  to  “decompose”  a vector  u into  a sum  of  two  terms,  one  term  being  a scalar  multiple  of  a 
specified  nonzero  vector  a and  the  other  term  being  orthogonal  to  a.  For  example,  if  u and  a are  vectors  in  g?  that  are  positioned 
so  their  initial  points  coincide  at  a point  Q , then  we  can  create  such  a decomposition  as  follows  (Figure  3.3.2): 

Drop  a perpendicular  from  the  tip  of  u to  the  line  through  a. 

Construct  the  vector  wq  from  Q to  the  foot  of  the  perpendicular. 


Construct  the  vector  W2  = u — wj . 


(?  w, 


Q » 


Q a 


(«)  (A)  (r)  (<0 

In  parts  (b)  through  (d),  u = 4 W2,  where  is  parallel  to  a and  W2  is  orthogonal  to  a. 


Since 


wi  4-  W2  = wi  + (u  — wi ) = u 

we  have  decomposed  u into  a sum  of  two  orthogonal  vectors,  the  first  term  being  a scalar  multiple  of  a and  the  second  being 
orthogonal  to  a. 


The  following  theorem  shows  that  the  foregoing  results,  which  we  illustrated  using  vectors  in  g},  apply  as  well  in  gn. 


Projection  Theorem 

If  u and  a are  vectors  in  R}\  and  if  a * 0,  then  u can  be  expressed  in  exactly  one  way  in  the  form  u = wj  4-  W2,  where 
is  a scalar  multiple  of  a and  W2  is  orthogonal  to  a. 


Since  the  vector  w\  is  to  be  a scalar  multiple  of  a,  it  must  have  the  form 

«T  = (7) 

Our  goal  is  to  find  a value  of  the  scalar  k and  a vector  W2  that  is  orthogonal  to  a such  that 

u = w!+w2  (8) 

We  can  determine  k by  using  7 to  rewrite  8 as 

u = wi  +W2  = ia  + W2 

and  then  applying  Theorems  3.2.2  and  3.2.3  to  obtain 

u ■ a = (jfca  -I-  W2)  • a = £||a||2  4-  (w2  * a)  (9) 

Since  W2  is  to  be  orthogonal  to  a,  the  last  term  in  9 must  be  0,  and  hence  k must  satisfy  the  equation 

u ■ a = £||a||2 

from  which  we  obtain 

INI2 

as  the  only  possible  value  for  k.  The  proof  can  be  completed  by  rewriting  8 as 

W2  = u — wi  = u — £a  = u — ^ a 

Ml2 

and  then  confirming  that  W2  is  orthogonal  to  a by  showing  that  v?2  * a = 0 (we  leave  the  details  for  you). 

The  vectors  w\  and  W2  in  the  Projection  Theorem  have  associated  names — the  vector  is  called  the  orthogonal  projection  of  u 

on  a or  sometimes  the  vector  component  of  u along  a,  and  the  vector  W2  is  called  the  vector  component  of  u orthogonal  to  a.  The 
vector  w\  is  commonly  denoted  by  the  symbol  projau,  in  which  case  it  follows  from  8 that  W2  = u — projau.  In  summary, 


projau  = u ^ a {vector  component  of  u along  a) 


(10) 


u — projau  = u — — — ^-a  {vector  component  of  u orthogonal  to  a) 

INI2 

EXAMPLE  4 Orthogonal  Projection  on  a Line 

Find  the  orthogonal  projections  of  the  vectors  e\  = (1,  0)  and  e2  = (0,  1)  on  the  line  L that  makes  an  angle  0 with 
the  positive  x-axis  in 

As  illustrated  in  Figure  3.3.3,  a = (cos  0,  sin  0)  is  a unit  vector  along  the  line  Z,  so  our  first  problem  is 
to  find  the  orthogonal  projection  of  ej  along  a.  Since 

|| a||  = Sin20  + cos20  = 1 and  ei  • a=  (1,  0)  • (cos  0,  sin0)  = cos  0 
it  follows  from  Formula  10  that  this  projection  is 

projaei  = el  ^ a = (cos  0)  (cos  0,  sin  0)  = [cos20,  sin  0cos  0 j 

Nl2  v 1 

Similarly,  since  e2  • a = (0,  1)  • (cos  0,  sin  0)  = sin  0,  it  follows  from  Formula  10  that 

ProJae2  = ? a = (sin  0)  (cos  0,  sin  0)  = (sin  0,  cos  0sin20  j 

iiaii2  1 1 


EXAMPLE  5 Vector  Component  of  u Along  a 


Let  u = (2,  —1,3)  and  a = (4,  —1,2).  Find  the  vector  component  of  u along  a and  the  vector  component  of  u 
orthogonal  to  a. 


Solution 

ua  = (2)(4)  + ( — 1)(  — 1)  + (3)  (2)  = 15 
||a||2  =42  + (-l)2  + 22  = 21 


Thus  the  vector  component  of  u along  a is 


• u - a 15  / a i /20  5 10 

^ T^‘  = 2T<4-  -'-2) “(T--7-T. 


and  the  vector  component  of  u orthogonal  to  a is 
u — projau  = (2,  - 1,  3) 


(20 

5 

1]_  | 

f-£ 

2 

11 ^ 

l 7 ’ 

7’ 

7 ) 

i 7’ 

7’ 

7 J 

As  a check,  you  may  wish  to  verify  that  the  vectors  u — projau  and  a are  perpendicular  by  showing  that  their  dot 
product  is  zero. 


a? 

e2  = «>,  I) 


n 


v 


cos  0 


j(sin  0,  cos  0) 


c,  =(1.  0) 


Figure  3.3.3 


Sometimes  we  will  be  more  interested  in  the  norm  of  the  vector  component  of  u along  a than  in  the  vector  component  itself.  A 
formula  for  this  norm  can  be  derived  as  follows: 


IIProJaull  = I 


-a  = 


u • a 


iu~ai 


where  the  second  equality  follows  from  part  (c)  of  Theorem  3.2.1  and  the  third  from  the  fact  that  ||a||2  > 0.  Thus, 

||Pr°Jau|l  = J|IfL 


If  9 denotes  the  angle  between  u and  a,  then  u ■ a = ||u||  ||a||  cos  9,  so  12  can  also  be  written  as 

llprojaii||  = ||u||  |cos  9\ 

(Verify.)  A geometric  interpretation  of  this  result  is  given  in  Figure  3.3.4. 


V 


I cos  6 


(a)  0 <9<  f 


- ||u||  cos  0 

(b)  £<0<it 


Figure  3.3.4 


(12) 


(13) 


The  Theorem  of  Pythagoras 

In  Section  3.2  we  found  that  many  theorems  about  vectors  in  p?  and  p/  also  hold  in  Rn.  Another  example  of  this  is  the  following 
generalization  of  the  Theorem  of  Pythagoras  (Figure  3.3.5). 

Theorem  of  Pythagoras  in  Rn 

If  u and  v are  orthogonal  vectors  in  Rn  with  the  Euclidean  inner  product,  then 

||u  + v||2=||u||2  + ||v||2  (14) 


Since  u and  v are  orthogonal,  we  have  u • v = 0?  from  which  it  follows  that 


I|u  + v||2=  (n  + v)  • (u  + v)  = ||u||2  +•  2(u  • v)  4-  ||u||2  + ||v||2 

EXAMPLE  6 Theorem  of  Pythagoras  in  R4 

We  showed  in  Example  1 that  the  vectors 

u=  ( — 2,  3,  1,4)  and  v=  (1,2,  0,-1) 
are  orthogonal.  Verify  the  Theorem  of  Pythagoras  for  these  vectors. 

We  leave  it  for  you  to  confirm  that 

u + v=(-l,5,  1,3) 

||u  + v||2  = 36 
l|u||2  + ||v||2  = 30  + 6 

Thus,  ||u  + v||2=||u||2+||v||2 


V 


r 

u 

Figure  3.3.5 


OPTIONAL 

Distance  Problems 

We  will  now  show  how  orthogonal  projections  can  be  used  to  solve  the  following  three  distance  problems: 

Problem  1.  Find  the  distance  between  a point  and  a line  in  g}. 

Problem  2.  Find  the  distance  between  a point  and  a plane  in  $}. 

Problem  3.  Find  the  distance  between  two  parallel  planes  in  g?. 

A method  for  solving  the  first  two  problems  is  provided  by  the  next  theorem.  Since  the  proofs  of  the  two  parts  are  similar,  we  will 
prove  part  ( b ) and  leave  part  ( a ) as  an  exercise. 


THEOREM  3.3.4 

(a)  In  p}  the  distance  D between  the  point  Pg(*0>  ^yg)  anc^  ^ne  ax  + + c = 0 is 

^ |aso  + l>y0+c| 

{J^b2 

(b)  In  p}  the  distance  D between  the  point  Pq  (xq,  jq,  zq)  and  the  plane  ax  + by  4-  cz  4-  d = 0 is 

D |flXQ4-&yo4-czo4-<af| 

{a^+b^+c2 


(15) 


(16) 


Proof  (b)  Let  Q(x\,  y\,  z\)  be  any  point  in  the  plane.  Position  the  normal  n = (a,  b,  c ) so  that  its  initial  point  is  at  Q.  As 
illustrated  in  Figure  3.3.6,  the  distance  D is  equal  to  the  length  of  the  orthogonal  projection  of  QP g on  n.  Thus,  it  follows  from 
Formula  12  that 


llprojngf’oll  = ■ 


QP  o ■ n 


But 


QPo  = (xo-x\,yo-yi,z0-zi) 

QP0-n  = a(xo-xi)  + b(y0-yi)  -¥c(zq-z\) 


l|n||  = fa 


+ b2+c2 

|g(x0-xi)  +&Op-yi)  -t-c(z0— zQ| 

fa  * + b2+c2 


Thus 


D-- 


Since  the  point  Q(x\,  y\,  z\)  lies  in  the  given  plane,  its  coordinates  satisfy  the  equation  of  that  plane;  thus 

ax  i + by  i “P  cz\  + d = 0 
or 

d = —ax  i — by  ^ — cz\ 

Substituting  this  expression  in  17  yields  16. 


EXAMPLE  7 Distance  Between  a Point  and  a Plane 


(17) 


Find  the  distance  D between  the  point  ( 1 , — 4,  — 3)  and  the  plane  2x  — 3y  4-  6z  = — 1 • 

Since  the  distance  formulas  in  Theorem  3.3.4  require  that  the  equations  of  the  line  and  plane  be  written 
with  zero  on  the  right  side,  we  first  need  to  rewrite  the  equation  of  the  plane  as 

2x  — 3y  + 6z  + 1 = 0 

from  which  we  obtain 

„ |2(1)  + ( — 3)(  — 4)  + 6(  — 3)  + 1|  | — 3 1 _ 3 

^22  + (-  3)2  + 62  7 7 


r >0’  ^ 


Figure  3.3.6 


The  third  distance  problem  posed  above  is  to  find  the  distance  between  two  parallel  planes  in  g}.  As  suggested  in  Figure  3.3.7,  the 


distance  between  a plane  V and  a plane  W can  be  obtained  by  finding  any  point  Pq  in  one  of  the  planes,  and  computing  the 
distance  between  that  point  and  the  other  plane.  Here  is  an  example. 


Figure  3.3.7 


The  distance  between  the  parallel  planes  V and  W is  equal  to  the  distance  between  Pq  and  W. 


EXAMPLE  8 Distance  Between  Parallel  Planes 


The  planes 


x + 2y  — 2z  = 3 and  2x  4-  Ay  — 4z  = 7 


are  parallel  since  their  normals,  (1,2,  — 2)  and  (2,  4,  — 4),  are  parallel  vectors.  Find  the  distance  between  these 
planes. 


To  find  the  distance  D between  the  planes,  we  can  select  an  arbitrary  point  in  one  of  the  planes  and 
compute  its  distance  to  the  other  plane.  By  setting  y = 0 in  the  equation  x + 2y  — 2z  = 3,  we  obtain  the  point 
Pq(3,  0,  0)  in  this  plane.  From  16,  the  distance  between  Pq  and  the  plane  2x  4-  Ay  — Az  = 7 is 


„ |2(3)+4(0)  + (-  4)(0)  — 7|  \ 

^22  + 42+(  — 4)2  6 


Concept  Review 

Orthogonal  (perpendicular)  vectors 

Orthogonal  set  of  vectors 

Normal  to  a line 

Normal  to  a plane 

Point-normal  equations 

Vector  form  of  a line 

Vector  form  of  a plane 

Orthogonal  projection  of  u on  a 

Vector  component  of  u along  a 

Vector  component  of  u orthogonal  to  a 

Theorem  of  Pythagoras 

Skills 

Determine  whether  two  vectors  are  orthogonal. 

Determine  whether  a given  set  of  vectors  forms  an  orthogonal  set. 

Find  equations  for  lines  (or  planes)  by  using  a normal  vector  and  a point  on  the  line  (or  plane). 
Find  the  vector  form  of  a line  or  plane  through  the  origin. 

Compute  the  vector  component  of  u along  a and  orthogonal  to  a. 


Find  the  distance  between  a point  and  a line  in  g}  or  g?. 
Find  the  distance  between  two  parallel  planes  in 
Find  the  distance  between  a point  and  a plane. 


Exercise  Set  3.3 

In  Exercises  1-2,  determine  whether  u and  v are  orthogonal  vectors. 

1. (a)  u=  (6,  1, 4),  v=  (2,  0,  -3) 

(b)  u = (0,  0,  — 1),  v=  (1,1,1) 

(c)  u=(-6,0,4),  v=  (3,  1,  6) 

(d)  u = (2,  4,  -8),  v=  (5,  3, 7) 

Answer: 

(a)  Orthogonal 

(b)  Not  orthogonal 

(c)  Not  orthogonal 

(d)  Not  orthogonal 

2. (a)  u = (2,  3),  v=  (5,  -7) 

(b)  u = (-6,  -2),  v=  (4,  0) 

(C)  u=  (1,  -5,4),  v=  (3,  3,  3) 

(d)  u = ( — 2,  2,  3),  v=  (1,7,  -4) 

In  Exercises  3-4,  determine  whether  the  vectors  form  an  orthogonal  set. 

3*  (a)  vi  = (2,  3),  v2  = (3,  2) 

(b)  v1  = (-l,l),v2=(l,l) 

(c)  V!  = ( - 2,  1,  1),  v2  = (1,  0,  2),  v3  = ( — 2,  — 5,  1) 

(d)  vi  = (-3,4,  — 1),  v2  = (1,  2,  5),  v3  = (4,  -3,0) 

Answer: 

(a)  Not  an  orthogonal  set 

(b)  Orthogonal  set 

(c)  Orthogonal  set 

(d)  Not  an  orthogonal  set 

4 (a)  vi  = (2,  3),  v2  = ( — 3,  2) 

(b)  v1  = (l,  — 2),  v2  = ( — 2,  1) 

(c)  vi  = (1,  0,  1),  v2  = (1,  1,  1),  v3  = ( - 1,  0,  1) 

(d)  V1  = (2,  - 2,  1),  v2  = (2,  1,  - 2),  v3  = (1,  2,  2) 

5.  Find  a unit  vector  that  is  orthogonal  to  both  u = (1,  0,  1)  and  v = (0,  1, 


Answer: 


±(/5’  k '^) 

(a)  Show  that  v = (a,  b)  and  w=  ( — b,  a)  are  orthogonal  vectors. 

(b)  Use  the  result  in  part  (a)  to  find  two  vectors  that  are  orthogonal  tov=(2,  — 3). 

(c)  Find  two  unit  vectors  that  are  orthogonal  to  ( — 3,4). 

7.  Do  the  points  .4(1,  1,  1),  5(  — 2,  0,  3),  and  C(  — 3,  —1,1)  form  the  vertices  of  a right  triangle?  Explain  your  answer. 
Answer: 

Yes 

8.  Repeat  Exercise  7 for  the  points  A( 3,  0,  2),  5(4,  3,  0),  and  C(8,  1,  — 1). 

In  Exercises  9-12,  find  a point-normal  form  of  the  equation  of  the  plane  passing  through  P and  having  n as  a normal. 

9. P(-  1,3,  -2),  n = ( — 2,  1,  -1) 

Answer: 

—2{x  + 1)  4-  (y  — 3)  — (z  + 2)  = 0 

10. P(1,1,4);  n = (1,  9,  8) 

H.P(2,  0,  0);  n=  (0,  0,  2) 

Answer: 

2z  = 0 

12.P(0,0,0);  n=  (1,  2,  3) 

In  Exercises  13-16,  determine  whether  the  given  planes  are  parallel. 

13. 4x  — y + 2z  = 5 and  7*  - 3y  + Az  = 8 
Answer: 

Not  parallel 

14.  x - Ay  - 3z  - 2 = 0 and  3X  - 12y  - 9z  - 7 = 0 
15-  2y  = 8x  - Az  + 5 and  x = ^ y 

Answer: 

Parallel 

16.  (_4,  1,2)  • (x,7,z)  = 0and(8,  -2,  -4)  - (x,y,z)  = 0 

In  Exercises  17-18,  determine  whether  the  given  planes  are  perpendicular. 

17.  3*  — 7 4-z  — 4 = 0,  x -E  2z  = —1 
Answer: 

Not  perpendicular 

18.  x — 2y  + 3z  = 4,  — 2x  -P  5y  4-  4z  = — 1 
In  Exercises  19-20,  find  ||projau||. 

19. (a)  u=  (1,  -2),  a=  ( — 4,  -3) 

(b)  u=  (3,  0,4),  a =(2,3,  3) 


Answer: 


(a)  2 

(b)  _JL 

y 22 

20. (a)  u=  (5,  6),  a =(2,  -1) 

(b)  u=  (3,  -2,6),  a=  (1,  2,  -7) 

In  Exercises  21-28,  find  the  vector  component  of  u along  a and  the  vector  component  of  u orthogonal  to  i 

21.  u =(6,2),  a = (3,  -9) 

Answer: 

(0,  0)  (6,  2) 

22.  «=(-!,  -2),  a = ( — 2,  3) 

23. u=(3,1,  -7),  a=  (1,  0,  5) 

Answer: 

f-ii  0 (55  x m 

[ 13’  ’ 13 y U3’  13 } 

24.  u=  (1,0,0),  a =(4,3,  8) 

25. u=  (1,1,  l),a=(0,  2,-1) 


Answer: 

H-i}  (’•!•!) 

26.  u = (2,  0,  1),  a=  (1,  2,  3) 

27. u=  (2,  1,  1,  2),  a = (4,  -4,2,  -2) 

Answer: 

(1  .1  1 _ M (2  §.  _2_  2l_\ 

\5’  5’  10’  10 J’  \5’  5’  10’  10 J 

28. u=(5,0,  — 3, 7),  a=  (2,  1,  -1,-1) 

In  Exercises  29-32,  find  the  distance  between  the  point  and  the  line. 
29. 4x  + 3,y  + 4 = 0;  (-3,  1) 

Answer: 

1 

30. x  -3^  + 2 = 0;  (-1,4) 

31.  y = — 4x  + 2,  (2,  — 5) 

Answer: 

1 

{Vi 

32.  3x  4-y  = 5;  (1,8) 

In  Exercises  33-36,  find  the  distance  between  the  point  and  the  plane. 


33.  (3,  1,  — 2);  x 4-  2y  — 2z  = 4 


Answer: 

5 

3 

34.  (-1,  -l,2);2x  + 5y-6z  = 4 

35.  ( — 1,2,  l);2x  + 3.y«-4z=l 

Answer: 


38.  3x  — Ay  +z  = 1 and  6x  — 8y  + 2z  = 3 

39.  -4x  4-y  - 3z  = 0 and  8x  - 2y  4-  6z  = 0 

Answer: 

0 (The  planes  coincide.) 

40.  2x-y+z  = 1 and  2 x-y+z=  - 1 

41.  Let  i,  j,  and  k be  unit  vectors  along  the  positive  x,  y,  and  z axes  of  a rectangular  coordinate  system  in  3-space.  If  v = (a,  b,  c) 
is  a nonzero  vector,  then  the  angles  a,  P,  and  y between  v and  the  vectors  i,  j,  and  k,  respectively,  are  called  the  direction 
angles  of  v (Figure  Ex-41),  and  the  numbers  cos  a,  cos  l i,  and  cos  7 are  called  the  direction  cosines  of  v. 

(a)  Show  that  cos  a = a I || v|| . 

(b)  Find  cos  ft  and  cos  7. 

(c)  Show  that  v / ||v||  = (cos  a,  cos  ft,  cos  7) . 

(d)  Show  that  cos2Ck  + cos2 ft  4-  cos27  = 1 . 


1/2? 

36.  (0,3,  “2);  x — y —z  =3 

In  Exercises  37^10,  find  the  distance  between  the  given  parallel  planes. 


37.  2x  “7  — z = 5 and  —Ax  + 27  -F  2z  = 1 2 


Answer: 

11 


fe 


k 


v 


V 


Figure  Ex-41 


Answer: 


42.  Use  the  result  in  Exercise  41  to  estimate,  to  the  nearest  degree,  the  angles  that  a diagonal  of  a box  with  dimensions 


10  cm  x 15  cm  x 25  cm  makes  with  the  edges  of  the  box. 

43.  Show  that  if  v is  orthogonal  to  both  and  W2,  then  v is  orthogonal  to  + &2W2  f°r  scalars  and 

44.  Let  u and  v be  nonzero  vectors  in  2-  or  3 -space,  and  let  k = ||u||  and  / = ||v|| . Show  that  the  vector  w=  lu  + kv  bisects  the 
angle  between  u and  v. 

45.  Prove  part  (a)  of  Theorem  3.3.4. 

46.  Is  it  possible  to  have 

projau  = projaa  ? 

Explain  your  reasoning. 

True-False  Exercises 

In  parts  (a)-(g)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  The  vectors  (3,  — 1,  2)  and  (0,  0,  0)  are  orthogonal. 

Answer: 

True 

(b)  If  u and  v are  orthogonal  vectors,  then  for  all  nonzero  scalars  k and  m , kn  and  mv  are  orthogonal  vectors. 

Answer: 

True 

(c)  The  orthogonal  projection  of  u along  a is  perpendicular  to  the  vector  component  of  u orthogonal  to  a. 

Answer: 

True 

(d)  If  a and  b are  orthogonal  vectors,  then  for  every  nonzero  vector  u,  we  have 

ProJa(ProJb  (u)  ) = 0 


Answer: 

True 

(e)  If  a and  u are  nonzero  vectors,  then 

proja(proja(u))  =proja(u) 


Answer: 

True 

(f)  If  the  relationship 

projau  = projav 

holds  for  some  nonzero  vector  a,  then  u = v- 
Answer: 

False 

(g)  For  all  vectors  u and  v,  it  is  true  that 

l|u  + v||  = HI  + IMI 


Answer: 


False 
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3.4  The  Geometry  of  Linear  Systems 

In  this  section  we  will  use  parametric  and  vector  methods  to  study  general  systems  of  linear  equations.  This  work  will  enable  us  to  interpret 
solution  sets  of  linear  systems  with  n unknowns  as  geometric  objects  in  Rn  just  as  we  interpreted  solution  sets  of  linear  systems  with  two 
and  three  unknowns  as  points,  lines,  and  planes  in  r}  and  r}. 


Vector  and  Parametric  Equations  of  Lines  in  R2  and  R3 

In  the  last  section  we  derived  equations  of  lines  and  planes  that  are  determined  by  a point  and  a normal  vector.  However,  there  are  other 
useful  ways  of  specifying  lines  and  planes.  For  example,  a unique  line  in  R2  or  R-'  is  determined  by  a point  xq  on  the  line  and  a nonzero 
vector  v parallel  to  the  line,  and  a unique  plane  in  R 3 is  determined  by  a point  xq  in  the  plane  and  two  noncollinear  vectors  vj  and  V2 
parallel  to  the  plane.  The  best  way  to  visualize  this  is  to  translate  the  vectors  so  their  initial  points  are  at  xq  (Figure  3.4.1). 


Figure  3.4.1 

Let  us  begin  by  deriving  an  equation  for  the  line  L that  contains  the  point  xq  and  is  parallel  to  v.  If  x is  a general  point  on  such  a line,  then, 
as  illustrated  in  Figure  3.4.2,  the  vector  x — xq  will  be  some  scalar  multiple  of  v,  say 

x — xq  = tv  or  equivalently  x = xq  + tv 

As  the  variable  t (called  a parameter)  varies  from  — qg  to  do,  the  point  x traces  out  the  line  L.  Accordingly,  we  have  the  following  result. 


THEOREM  3.4.1 

Let  L be  the  line  in  R2  or  R that  contains  the  point  xq  and  is  parallel  to  the  nonzero  vector  V-  Then  the  equation  of  the  line  through 
xq  that  is  parallel  to  v is 

x = xq  -F  £v  (1) 

If  xq  = 0,  then  the  line  passes  through  the  origin  and  the  equation  has  the  form 

x = tv  (2) 


Although  it  is  not  stated  explicitly,  it  is  understood  in 
Formulas  1 and  2 that  the  parameter  t varies  from  —do  to  oq. 
This  applies  to  all  vector  and  parametric  equations  in  this  text 
except  where  stated  otherwise. 


Figure  3.4.2 


Vector  and  Parametric  Equations  of  Planes  in  R3 

Next  we  will  derive  an  equation  for  the  plane  W that  contains  the  point  xq  and  is  parallel  to  the  noncollinear  vectors  v\  and  V2.  As  shown  in 
Figure  3.4.3,  if  x is  any  point  in  the  plane,  then  by  forming  suitable  scalar  multiples  of  vj  and  V2,  say  and  ^2V2>  we  can  create  a 
parallelogram  with  diagonal  x — xq  and  adjacent  sides  and  ^2V2-  Thus,  we  have 

x — xq  = ^ivi  + ^2v2  or  equivalently  x = xq  + 0V1  + *2V2 


Figure  3.4.3 

As  the  variables  t\  and  tj  (called parameters ) vary  independently  from  — qq  to  do,  the  point  x varies  over  the  entire  plane  W.  Accordingly, 
we  make  the  following  definition. 


THEOREM  3.4.2 

Let  IF  be  the  plane  in  p}  that  contains  the  point  xq  and  is  parallel  to  the  noncollinear  vectors  v\  and  V2-  Then  an  equation  of  the 
plane  through  xq  that  is  parallel  to  v\  and  V2  is  given  by 

x = xq=Mivi  4-  t2v2  (3) 


If  xq  = 0,  then  the  plane  passes  through  the  origin  and  the  equation  has  the  form 

x = ^ivi=K2v2  (4) 


Observe  that  the  line  through  xq  represented  by  Equation  1 is  the  translation  by  xq  of  the  line  through  the  origin  represented  by 
Equation  2 and  that  the  plane  through  xq  represented  by  Equation  3 is  the  translation  by  xq  of  the  plane  through  the  origin  represented  by 
Equation  4 (Figure  3.4.4). 


X = x0  + t\ 


y 


* 0 


\ = t\ 

V * 
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Figure  3.4.4 


Motivated  by  the  forms  of  Formulas  1 to  4,  we  can  extend  the  notions  of  line  and  plane  to  Rn  by  making  the  following  definitions. 


DEFINITION  1 

If  XQ  and  v are  vectors  in  Rn,  and  if  v is  nonzero,  then  the  equation 

x = xq  4-  tv  (5) 

defines  the  line  through  xq  that  is  parallel  to  v.  In  the  special  case  where  xq  = 0,  the  line  is  said  to  pass  through  the  origin. 

L J 

r n 


DEFINITION  2 

If  xq,  vi  , and  V2  are  vectors  in  Rn,  and  if  v\  and  V2  are  not  collinear,  then  the  equation 

x = xq=Kivi  =K2v2  (6) 

defines  the  plane  through  xq  that  is  parallel  tov\  and  V2  • In  the  special  case  where  xq  = 0,  the  plane  is  said  to  pass  through  the 
origin. 


L J 

Equations  5 and  6 are  called  vector  forms  of  a line  and  plane  in  Rn.  If  the  vectors  in  these  equations  are  expressed  in  terms  of  their 
components  and  the  corresponding  components  on  each  side  are  equated,  then  the  resulting  equations  are  called  parametric  equations  of 
the  line  and  plane.  Here  are  some  examples. 


EXAMPLE  1 Vector  and  Parametric  Equations  of  Lines  in  R2  and  R3 

Find  a vector  equation  and  parametric  equations  of  the  line  in  R2  that  passes  through  the  origin  and  is  parallel  to  the 
vector  v = ( — 2,  3). 

Find  a vector  equation  and  parametric  equations  of  the  line  in  R-'  that  passes  through  the  point  Fq(1,  2,  — 3)  and  is 
parallel  to  the  vector  v = (4,  — 5,  1) . 

Use  the  vector  equation  obtained  in  part  (b)  to  find  two  points  on  the  line  that  are  different  from  Fq. 

Solution 

It  follows  from  5 with  xq  = 0 that  a vector  equation  of  the  line  is  x = tv-  If we  let  x = (x,  y),  then  this  equation  can  be 
expressed  in  vector  form  as 

(*.*)=<( -2,  3) 

Equating  corresponding  components  on  the  two  sides  of  this  equation  yields  the  parametric  equations 


x = — 2t,  y = 3t 


It  follows  from  5 that  a vector  equation  of  the  line  is  x = xq  + tv.  If  we  let  x = (x,y,  z) , and  if  we  take 
xo  = (l,2,  — 3),  then  this  equation  can  be  expressed  in  vector  form  as 

(x,y,Z)  = (\,2,  -3)+*(4,  -5,1)  (7) 

Equating  corresponding  components  on  the  two  sides  of  this  equation  yields  the  parametric  equations 

x = \ +At,  y = 2 — 5t,  z — — 3 

A point  on  the  line  represented  by  Equation  7 can  be  obtained  by  substituting  a specific  numerical  value  for  the 
parameter  t . However,  since  t = 0 produces  (Xj  yfz)  = ( 1,2,  — 3),  which  is  the  point  Pq,  this  value  of  t does  not  serve 
our  purpose.  Taking  t — ] produces  the  point  (5,  “3,  “2)  and  taking  t — _ ] produces  the  point  ( — 3,  7,  “4).  Any 
other  distinct  values  for  t (except  t = Q)  would  work  just  as  well. 


o 

EXAMPLE  2 Vector  and  Parametric  Equations  of  a Plane  in  R 

Find  vector  and  parametric  equations  of  the  plane  x — y + 2z  = 5- 

We  will  find  the  parametric  equations  first.  We  can  do  this  by  solving  the  equation  for  any  one  of  the  variables  in 
terms  of  the  other  two  and  then  using  those  two  variables  as  parameters.  For  example,  solving  for  x in  terms  of  y and  z yields 

x = 5+y-2z  (8) 


and  then  using  y and  z as  parameters  t\  and  ^ respectively,  yields  the  parametric  equations 

x = 5-Mi-2*2,  y = t i,  z = t2 


We  would  have  obtained  different  parametric  and 
vector  equations  in  Example  2 had  we  solved  8 fory  or 
z rather  than  x.  However,  one  can  show  the  same  plane 
results  in  all  three  cases  as  the  parameters  vary  from 
—oo  to  dq. 


To  obtain  a vector  equation  of  the  plane  we  rewrite  these  parametric  equations  as 

(x,y,z)  = \ -2t2,t{,t2) 


or,  equivalently,  as 


(x,y,z)  = (5,  0,  0)  +*i(lf  1,  0)  + *2(  “ 2,  0,  1) 


EXAMPLE  3 Vector  and  Parametric  Equations  of  Lines  and  Planes  in  Z?4 

Find  vector  and  parametric  equations  of  the  line  through  the  origin  of  that  is  parallel  to  the  vector  v=(5,  — 3,6,1). 
Find  vector  and  parametric  equations  of  the  plane  in  that  passes  through  the  point  xq  = (2,  — 1,  0,  3)  and  is  parallel 
to  both  vi  = (1,  5,  2,  — 4)  and  V2  = (0,  7,  —8,6). 

Solution 

If  we  let  x = (xj,  *2,  x2>  x4)>  then  the  vector  equation  x = tv  can  he  expressed  as 

Ol,*2>  *3,  *4)  =*(5,  -3,6,1) 

Equating  corresponding  components  yields  the  parametric  equations 

x\  = 5 1,  *2  = — 3 1,  *3  = 6t,  X4  = t 


(b)  The  vector  equation  x = xq  4-  4=  £2V 2 can  expressed  as 

(xi,x2,x3,x4)  = (2,  — 1,  0,  3)  +*i(l,  5,  2,  -4)+<2(0.7,  -8,6) 

which  yields  the  parametric  equations 

xi  = 2 + 

x2  = — 1 + 5t\  4s  7^2 

*3  = 2^i  — 8;2 

*4  = 3 — At\  4-  6^2 


L/nes  Through  Two  Points  in  Rn 

If  xq  and  xi  are  distinct  points  in  Rn,  then  the  line  determined  by  these  points  is  parallel  to  the  vector  v = xj  — xq  (Figure  3.4.5),  so  it 
follows  from  5 that  the  line  can  be  expressed  in  vector  form  as 


x = xq  +*(xi  - xq) 


or,  equivalently,  as 


x=  (1  -£)xci  + *xi 

These  are  called  the  two-point  vector  equations  of  a line  in  Rn. 

EXAMPLE  4 A Line  Through  Two  Points  in  R2 

Find  vector  and  parametric  equations  for  the  line  in  R2  that  passes  through  the  points  P(0,  7)  and  Q(5,  0). 


(9) 


(10) 


We  will  see  below  that  it  does  not  matter  which  point  we  take  to  be  xq  and  which  we  take  to  be  xj , so  let  us 
choose  xq  = (0,  7)  and  xi  = (5,  0).  It  follows  that  xj  — xq  = (5,  — 7)  and  hence  that 


(x,7)  = (0,7)+*(5,  -7) 


(11) 


which  we  can  rewrite  in  parametric  form  as 


x = 5t,  y = 1 — It 

Had  we  reversed  our  choices  and  taken  xq  = (5,  0)  and  x\  = (0,  7),  then  the  resulting  vector  equation  would  have  been 


(x,y)  = (5,  0)+*(-5.7) 


(12) 


and  the  parametric  equations  would  have  been 


x = 5 — 5t,  y = It 

(verify).  Although  11  and  12  look  different,  they  both  represent  the  line  whose  equation  in  rectangular  coordinates  is 

7x  + 5y  = 35 

(Figure  3.4.6).  This  can  be  seen  by  eliminating  the  parameter  t from  the  parametric  equations  (verify). 
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Figure  3.4.6 


The  point  x = (x,  y)  in  Equations  9 and  10  traces  an  entire  line  in  g 2 as  the  parameter  t varies  over  the  interval  ( — oo,  oo) . If,  however, 
we  restrict  the  parameter  to  vary  from  t — Q to  t = 1 , then  x will  not  trace  the  entire  line  but  rather  just  the  line  segment  joining  the  points 
xq  and  xi . The  point  x will  start  at  xq  when  t — Q and  end  at  xj  when  t=\.  Accordingly,  we  make  the  following  definition. 


DEFINITION  3 

If  xq  and  xj  are  vectors  in  gn,  then  the  equation 

x = x0  + /(xi  -x0)  (0</<  1)  (13) 

defines  the  line  segment  from  xq  to  x\ . When  convenient,  Equation  13  can  be  written  as 

x=  (1  -/)xq4-/xi  (0</<  1)  (14) 


L 
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EXAMPLE  5 A Line  Segment  from  One  Point  to  Another  in  R2 

It  follows  from  13  and  14  that  the  line  segment  in  g}  from  xo  = (1,  — 3)  to  xj  = (5,  6)  can  be  represented  either  by  the 
equation 


or  by 


x = (1,  — 3)  + /(4,  9)  (0 <* < 1) 
x=  (1  —0(1-  “3)  4-/(5,  6)  (0  <*  < 1) 


Dot  Product  Form  of  a Linear  System 

Our  next  objective  is  to  show  how  to  express  linear  equations  and  linear  systems  in  dot  product  notation.  This  will  lead  us  to  some 
important  results  about  orthogonality  and  linear  systems. 

Recall  that  a linear  equation  in  the  variables  x\,  x 2, xn  has  the  form 

a\x\  +<*2*2  4 + = b (a\,  ^2, an  not  all  zero)  (15) 

and  that  the  corresponding  homogeneous  equation  is 

31*1  4-  <*2*2  + ...4 -ctnxn  = 0 (a  1,  <22, ctn  not  all  zero) 

These  equations  can  be  rewritten  in  vector  form  by  letting 

a=  (ai,<32> and  x = (x\,  X2> --->  *n) 


(16) 


in  which  case  Formula  15  can  be  written  as 


a-x  = & (17) 

and  Formula  16  as 

a-x  = 0 (18) 


Except  for  a notational  change  from  n to  a,  Formula  18  is  the  extension  to  Rn  of  Formula  6 in  Section  3.3.  This  equation  reveals  that  each 
solution  vector  x of  a homogeneous  equation  is  orthogonal  to  the  coefficient  vector  a.  To  take  this  geometric  observation  a step  further, 
consider  the  homogeneous  system 


<211*1 

<212*2 

+ .. 

..  + 

<2  1m*  m 

= 0 

<221*  1 

+ 

<222*2 

+ - 

..  + 

<22  nxn 

= 0 

am\x\ 

+ 

tfm2*2 

4=  .. 

..  + 

<2m«*« 

= 0 

If  we  denote  the  successive  row  vectors  of  the  coefficient  matrix  by  r\,  r2,  rm,  then  we  can  rewrite  this  system  in  dot  product  form  as 


iq  • x = 0 

r2  * x = 0 

rm  ■ x = 0 

from  which  we  see  that  every  solution  vector  x is  orthogonal  to  every  row  vector  of  the  coefficient  matrix.  In  summary,  we  have  the 
following  result. 


(19) 


THEOREM  3.4.3 

If  A is  an  m x n matrix,  then  the  solution  set  of  the  homogeneous  linear  system  Ax.=  0 consists  of  all  vectors  in  Rn  that  are 
orthogonal  to  every  row  vector  of  A. 


EXAMPLE  6 Orthogonality  of  Row  Vectors  and  Solution  Vectors 


We  showed  in  Example  6 of  Section  1.2  that  the  general  solution  of  the  homogeneous  linear  system 
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is 


x\  = — 3r  — As  — 2t,  X2  = r,  *3=  —2s,  x4  = s,  x$  = t,  x$  = 0 


which  we  can  rewrite  in  vector  form  as 

x = ( — 3r  — As  — 2t,  r,  — 2s,  s,  t,  0) 


According  to  Theorem  3.4.3,  the  vector  x must  be  orthogonal  to  each  of  the  row  vectors 

ri  = (1,  3,  — 2,  0,2,0) 
r2  = (2,  6,  — 5,  -2,4,  -3) 
r3  = (0,  0,  5,  10,  0,  15) 
r4  = (2,  6,  0,  8,4,  18) 


We  will  confirm  that  x is  orthogonal  to  r \ , and  leave  it  for  you  to  verify  that  x is  orthogonal  to  the  other  three  row  vectors  as 
well.  The  dot  product  of  r\  and  x is 

ri  ■ x=  1(  - 3r-4s- 2t)  + 3 (r)  + ( - 2)(  - 2s)  + 0 (s)  + 2(0  + 0(0)  = 0 

which  establishes  the  orthogonality. 


The  Relationship  Between  Ax  = 0 and  Ax  = b 


We  will  conclude  this  section  by  exploring  the  relationship  between  the  solutions  of  a homogeneous  linear  system  = 0 and  the  solutions 
(if  any)  of  a nonhomogeneous  linear  system  Ax  = b that  has  the  same  coefficient  matrix.  These  are  called  corresponding  linear  systems. 


To  motivate  the  result  we  are  seeking,  let  us  compare  the  solutions  of  the  corresponding  linear  systems 
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We  showed  in  Example  5 and  Example  6 of  Section  1.2  that  the  general  solutions  of  these  linear  systems  can  be  written  in  parametric  form 
as 


homogeneous  — ► x \ = — 3r  — As  — 2t,  X2  = r,  *3  — “2s,  X4  = s,  x^  = t7  x^  = 0 

nonhomogeneous  — ► x\  = — 3r  — 4s  — 2t,  X2  = r,  *3  = —2s,  X4  = s,  x$  = t,  x$  = 

which  we  can  then  rewrite  in  vector  form  as 

homogeneous  — ► (*i,  X2 , *3,  *4,  x$)  = ( — 3r  — 4s  — 2t,  rt  — 2s,  s,  t , 0) 

nonhomogeneous  — ► (^1,  *2,  *3,  *4,  *5)  = | — 3r  — 4s  — 2t,  r,  — 2s,  s,  tr  j 

By  splitting  the  vectors  on  the  right  apart  and  collecting  terms  with  like  parameters,  we  can  rewrite  these  equations  as 


homogeneous  — ► (x\,  X2,  *3,  *4,  *5)  = r(  — 3,  1,  0,  0,  0)  H=s(  — 4,  0,  — 2,  1,  0,  0)  +*(  — 2,  0,  0,  0,  1,  0)  (20) 


nonhomogeneous  — ► (x\,  X2,  *3,  *4,  * 5 ) = /"(  — 3,  1,  0,  0,  0)  -hs(  — 4,  0,  — 2,  1,  0,  0)  H-  t{  — 2,  0,  0,  0,  1,  0)  + ^0,  0,  0,  0,  0,  -ij  (21) 

Formulas  20  and  21  reveal  that  each  solution  of  the  nonhomogeneous  system  can  be  obtained  by  adding  the  fixed  vector  1 0,  0,  0,  0,  0,  j 
to  the  corresponding  solution  of  the  homogeneous  system.  This  is  a special  case  of  the  following  general  result. 


THEOREM  3.4.4 

The  general  solution  of  a consistent  linear  system  = b can  be  obtained  by  adding  any  specific  solution  of  = b to  the  general 
solution  of  Ax.  = 0- 


Let  xq  be  any  specific  solution  of  Ax  = b>  let  W denote  the  solution  set  of  Ax  = 0?  and  let  xq  + W denote  the  set  of  all  vectors  that 
result  by  adding  xq  to  each  vector  in  W.  We  must  show  that  if  x is  a vector  in  xq  4-  W,  then  x is  a solution  of  Ax  = b>  and  conversely,  that 
every  solution  of  ^x  = h is  in  the  set  xq  + W. 

Assume  first  that  x is  a vector  in  xq  4=  W . This  implies  that  x is  expressible  in  the  form  x = X0  4-  w,  where  Axq  = b and  / hv  = 0 . Thus, 

Ax  = yl(xo  + w)  = Axfj  4=  Aw  = b + 0 = b 


which  shows  that  x is  a solution  of  Ax  = b . 


Conversely,  let  x be  any  solution  of  Ax  = b To  show  that  x is  in  the  set  xq  -f  W we  must  show  that  x is  expressible  in  the  form 

X = XQ+W  (22) 

where  w is  in  W (i.e.,  Aw  = 0)-  We  can  do  this  by  taking  w = x — xq  . This  vector  obviously  satisfies  22,  and  it  is  in  W since 

Aw  = ^4(x  — xq)  =Ax  — Ax.  q = b — b = 0 


The  solution  set  of  Ax.  = b is  a translation  of  the  solution  space  of  Ax  = 0- 


Theorem  3.4.4  has  a useful  geometric  interpretation  that  is  illustrated  in  Figure  3.4.7.  If,  as  discussed  in  Section  3.1,  we  interpret 
vector  addition  as  translation,  then  the  theorem  states  that  if  xq  is  any  specific  solution  of  Ax.  — b>  then  the  entire  solution  set  of  Ax  = b can 
be  obtained  by  translating  the  solution  set  of  ^Jx  = Q by  the  vector  xq  . 


Concept  Review 

Parameters 

Parametric  equations  of  lines 
Parametric  equations  of  planes 
Two-point  vector  equations  of  a line 
Vector  equation  of  a line 
Vector  equation  of  a plane 

Skills 

Express  the  equations  of  lines  in  g}  and  g}  using  either  vector  or  parametric  equations. 

Express  the  equations  of  planes  in  gn  using  either  vector  or  parametric  equations. 

Express  the  equation  of  a line  containing  two  given  points  in  g 2 or  g 3 using  either  vector  or  parametric  equations. 

Find  equations  of  a line  and  a line  segment. 

Verify  the  orthogonality  of  the  row  vectors  of  a linear  system  of  equations  and  a solution  vector. 

Use  a specific  solution  to  the  nonhomogeneous  linear  system  Ax  — b and  the  general  solution  of  the  corresponding  linear  system 
Ax  = 0 to  obtain  the  general  solution  to  Ax.  = b- 


Exercise  Set  3.4 

In  Exercises  1-4,  find  vector  and  parametric  equations  of  the  line  containing  the  point  and  parallel  to  the  vector. 
1.  Point:  ( — 4,  1);  vector:  v = (0,  — 8) 

Answer: 

Vector  equation:  (*,  ^ ) = ( - 4,  1 ) + * (0,  - 8); 


parametric  equations:  x = — 4,  ^ = 1—  8^ 


2.  Point:  (2,  =1);  vector:  v=  ( — 4,  — 2) 

3.  Point:  (0,  0,  0);  vector:  v=  ( — 3,  0,  1) 

Answer: 

Vector  equation:  (x,y,z)  =t(  — 3,  0,  1); 

parametric  equations:  x = —3 1,  y = 0,  z = t 

4.  Point:  ( — 9,  3,  4);  vector:  v=  ( — 1,  6,  0) 

In  Exercises  5-8,  use  the  given  equation  of  a line  to  find  a point  on  the  line  and  a vector  parallel  to  the  line. 

5.  x=  (3  — 5t,  — 6 — t) 

Answer: 

Point:  (3,  — 6);  parallel  vector:  ( — 5,  — 1) 

6.  (x,  y , z)  = (4 1, 7,  4 4-  3 1) 

7. x=(1-0(4,  6) +<(-2,  0) 

Answer: 

Point:  (4,  6);  parallel  vector:  (“6,  — 6) 

8. *=  (1-0(0,  -5,1) 

In  Exercises  9-12,  find  vector  and  parametric  equations  of  the  plane  containing  the  given  point  and  parallel  vectors. 

9.  Point:  ( — 3,  1,0);  vectors:  vi  = (0,  — 3,  6)  and  V2  = ( — 5,  1,2) 

Answer: 

Vector  equation:  (x,y,z)  = (-3,  1,  0)  +*i(0,  -3,  6)  + *2(-5,  1,  2); 

parametric  equations:  x = - 3 - 5 t2,  ,y  = 1 - 3*i  + f2,  z = & 1 + 

10.  Point:  (0,  6,  — 2);  vectors:  v\  = (0,  9,  — 1)  and  v2  = (0,  — 3,  0) 

11.  Point:  ( — 1,  1,4);  vectors:  vi  = (6,  — 1,  0)  and  v2  = ( — 1,3,1) 

Answer: 

Vector  equation:  (x,y,z)  = (-  1,  1,4)  + *i(6,  - 1,  0)  +*2(-  1,  3,  1), 

parametric  equations:  x = — 1 -h  6^i  — ^2,  7 = 1 - 1\  + 3*2,  z = 4 4-  *2 

12.  Point:  (0,  5,  —4);  vectors:  vi  = (0,  0,  — 5)  and  v2  = (1,  — 3,  — 2) 

In  Exercises  13-14,  find  vector  and  parametric  equations  of  the  line  in  r}  that  passes  through  the  origin  and  is  orthogonal  to  v. 

13. *=  (-2,  3) 

Answer: 

A possible  answer  is  vector  equation:  (*,  y ) = t(3,  2)1 

parametric  equations:  x = 3 1,  y = 2i 

14.  v = (1,  -4) 

In  Exercises  15-16,  find  vector  and  parametric  equations  of  the  plane  in  R*  that  passes  through  the  origin  and  is  orthogonal  to  ^ 

15.  v = (4,  0,  — 5)  [Hint:  Construct  two  nonparallel  vectors  orthogonal  to  v in  R?]. 


Answer: 


A possible  answer  is  vector  equation:  (x,y,z)  = t \ (0,  1 , 0)  4~  ^ (5,  0,4); 

parametric  equations:  x A*  5t2,  y = £\,  z = 4t2 

16. v=(3,  1,  -6) 

In  Exercises  17-20,  find  the  general  solution  to  the  linear  system  and  confirm  that  the  row  vectors  of  the  coefficient  matrix  are  orthogonal 
to  the  solution  vectors. 

17.  x\  4-  X2  + *3  = 0 
2*i  A-  2x2  + 2*3  = 0 
3xi  + 3*2  + 3x3  = 0 

Answer: 

xi  = — s — X2  = s,  X3  = t 

18.  xj  +3x2  -4x3  = 0 
2xi  4=  6x2  — 8x3  = 0 

19.  xi  =H  5x2  + X3  4-  2x4  “ *5  = 0 

x 1 — 2x2  — *3  + 3x4  + 2x5  = 0 

Answer: 

*1  x2=  -jrA-ljS+jt,  X3  = r,  x4  = s,  x5  = t 

20.  *1  4- 3x2 -4x3  = 0 

xi  4-  2x2  + 3x3  = 0 

(a)  The  equation  x +y  +z  = 1 can  be  viewed  as  a linear  system  of  one  equation  in  three  unknowns.  Express  a general  solution  of  this 
equation  as  a particular  solution  plus  a general  solution  of  the  associated  homogeneous  system. 

(b)  Give  a geometric  interpretation  of  the  result  in  part  (a). 

Answer: 

(a)  (l,0,0)+s(-l,l,0)+*(-  1,0,1) 

(b)  a plane  in  passing  through  P(l,  0,  0)  and  parallel  to  ( — 1,  1,0)  and  ( — 1,0,  1) 

22*  (a)  The  equation  x ^y  = 1 can  be  viewed  as  a linear  system  of  one  equation  in  two  unknowns.  Express  a general  solution  of  this 
equation  as  a particular  solution  plus  a general  solution  of  the  associated  homogeneous  system. 

(b)  Give  a geometric  interpretation  of  the  result  in  part  (a). 

(a)  Find  a homogeneous  linear  system  of  two  equations  in  three  unknowns  whose  solution  space  consists  of  those  vectors  in  that  are 
orthogonal  toa=(l,  1,  1)  and  b = ( — 2,  3,  0). 

(b)  What  kind  of  geometric  object  is  the  solution  space? 

(c)  Find  a general  solution  of  the  system  obtained  in  part  (a),  and  confirm  that  Theorem  3.4.3  holds. 

Answer: 

(a)  * + y + z = 0 

— 2x  + 3 y =0 

(b)  a line  through  the  origin  in  R 3 

(°)  x = - Jt,  y = - %t,  z = t 

(a)  Find  a homogeneous  linear  system  of  two  equations  in  three  unknowns  whose  solution  space  consists  of  those  vectors  in  $}  that  are 
orthogonal  to  a =(  — 3,  2,  — 1)  and  b = (0,  — 2,  — 2). 

(b)  What  kind  of  geometric  object  is  the  solution  space? 


(c)  Find  a general  solution  of  the  system  obtained  in  part  (a),  and  confirm  that  Theorem  3.4.3  holds. 


25.  Consider  the  linear  systems 
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(a)  Find  a general  solution  of  the  homogeneous  system. 

(b)  Confirm  that  x\  = 1,  *2  = 0,  *3  = 1 is  a solution  of  the  nonhomogeneous  system. 

(c)  Use  the  results  in  parts  (a)  and  (b)  to  find  a general  solution  of  the  nonhomogeneous  system. 

(d)  Check  your  result  in  part  (c)  by  solving  the  nonhomogeneous  system  directly. 

Answer: 

a-  *i  = -j s + j*,  *2  =s,  *3  =t 
c-  *1  = 1 — -jS+yC  *2  =s,  *3=1  +t 
26.  Consider  the  linear  systems 
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(a)  Find  a general  solution  of  the  homogeneous  system. 

(b)  Confirm  that  x\  = 1,  *2  = 1,  *3  = 1 is  a solution  of  the  nonhomogeneous  system. 

(c)  Use  the  results  in  parts  (a)  and  (b)  to  find  a general  solution  of  the  nonhomogeneous  system. 

(d)  Check  your  result  in  part  (c)  by  solving  the  nonhomogeneous  system  directly. 

In  Exercises  27-28,  find  a general  solution  of  the  system,  and  use  that  solution  to  find  a general  solution  of  the  associated  homogeneous 
system  and  a particular  solution  of  the  given  system. 
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Answer: 

x\  = -j  — js  — -j t,  *2  =s>  x3  = £>  *4=1;  The  general  solution  of  the  associated  homogeneous  system  is 
* 1 = — — jt,  *2  =s,  *3  = t,  *4  = 0.  A particular  solution  of  the  given  system  is  * \ = -i,  *2  = 0,  *3  = 0,  *4  = 1 . 
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True-False  Exercises 


In  parts  (a)-(f)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 


(a)  The  vector  equation  of  a line  can  be  determined  from  any  point  lying  on  the  line  and  a nonzero  vector  parallel  to  the  line. 
Answer: 

True 

(b)  The  vector  equation  of  a plane  can  be  determined  from  any  point  lying  in  the  plane  and  a nonzero  vector  parallel  to  the  plane. 
Answer: 

False 

(c)  The  points  lying  on  a line  through  the  origin  in  or  are  all  scalar  multiples  of  any  nonzero  vector  on  the  line. 

Answer: 

True 

(d)  All  solution  vectors  of  the  linear  system  Ax:  = b are  orthogonal  to  the  row  vectors  of  the  matrix  A if  and  only  if  b = Q. 
Answer: 

True 

(e)  The  general  solution  of  the  nonhomogeneous  linear  system  Ax:  = b can  be  obtained  by  adding  b to  the  general  solution  of  the 
homogeneous  linear  system  Ax  = 0- 

Answer: 

False 

(f)  If  xi  and  X2  are  two  solutions  of  the  nonhomogeneous  linear  system  Ax  = b>  then  x\  — X2  is  a solution  of  the  corresponding 
homogeneous  linear  system. 

Answer: 

True 
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3.5  Cross  Product 

This  optional  section  is  concerned  with  properties  of  vectors  in  3 -space  that  are  important  to  physicists  and 
engineers.  It  can  be  omitted,  if  desired,  since  subsequent  sections  do  not  depend  on  its  content.  Among  other 
things,  we  define  an  operation  that  provides  a way  of  constructing  a vector  in  3 -space  that  is  perpendicular  to  two 
given  vectors,  and  we  give  a geometric  interpretation  of  3 x 3 determinants. 


Cross  Product  of  Vectors 

In  Section  3.2  we  defined  the  dot  product  of  two  vectors  u and  v in  w-space.  That  operation  produced  a scalar  as  its 
result.  We  will  now  define  a type  of  vector  multiplication  that  produces  a vector  as  the  result  but  which  is 
applicable  only  to  vectors  in  3-space. 


DEFINITION  1 


If  u = (it  i , U2,  u3)  and  v = (v  \ , v3,  V3)  are  vectors  in  3 -space,  then  the  cross  product  u x v is  the  vector 
defined  by 

uxv  = («2V3 -U3V2,  «3V1  -W1V3,  u\v2-u2v\) 


or,  in  determinant  notation, 


(u2  u 3 
yv2  v3 ' 


u 3 
VI  v3 


«1  u2  \ 
VI  V2  j 


(1) 


J 


Instead  of  memorizing  1,  you  can  obtain  the  components  of  u x v as  follows: 


Form  the  2x3  matrix 


«1 

vi 


v2 


v3 


whose  first  row  contains  the  components  of  u and  whose  second  row 


contains  the  components  of  v. 


To  find  the  first  component  of  u x v>  delete  the  first  column  and  take  the  determinant;  to  find  the  second 
component,  delete  the  second  column  and  take  the  negative  of  the  determinant;  and  to  find  the  third  component, 
delete  the  third  column  and  take  the  determinant. 


EXAMPLE  1 Calculating  a Cross  Product 

Findux  v>  where  u=  (1,  2,  — 2)  and  v=  (3,  0,  1). 

From  either  1 or  the  mnemonic  in  the  preceding  remark,  we  have 


UX  V = 


2 -2 

0 1 


1 -2 
3 1 


1 2 
3 0 


(2,  -7,  -6) 


The  following  theorem  gives  some  important  relationships  between  the  dot  product  and  cross  product  and  also 
shows  that  u x v is  orthogonal  to  both  u and  v. 


The  cross  product  notation  AxB  was  introduced  by  the  American  physicist  and 
mathematician  J.  Willard  Gibbs,  (see  p.  134)  in  a series  of  unpublished  lecture  notes  for  his  students  at  Yale 
University.  It  appeared  in  a published  work  for  the  first  time  in  the  second  edition  of  the  book  Vector 
Analysis , (Edwin  Wilson)  by  Edwin  Wilson  (1879—1964),  a student  of  Gibbs.  Gibbs  originally  referred  to 
A x B as  the  “skew  product.” 


Relationships  Involving  Cross  Product  and  Dot  Product 


If  u,  v,  and  w are  vectors  in  3 -space,  then 

(a) 

u • (u  x v)  = 0 

(u  x v is  orthogonal  to  u) 

(*) 

u*(uxv)  = 0 

(uxvis  orthogonal  to  v) 

(0 

||uxv||2  = ||u||2||v||2-  (u-v)2 

(Lagrange  ' s identity) 

(d) 

U X (v  x w)  = (u  • w)  V — (u  • v)w 

(relationship  between  cross  and  dot  products) 

0) 

(u  X v)  X w = (u  • w) v — (v  • w)u 

(relationship  between  cross  and  dot  products) 

Letu=  (u\,  U2 , 2^3)  and  v=  (y\,  V2,  V3).  Then 

U’(nxv)  = (ui,«2,«3)  ’ (^2^3  ~ “3^2- “3^1  - 2^3,  u\V2  - 2^1) 

= ti\(ii2V2  -W3V2)  + «2(«3Vl  “ wlv3)  +U2(u\V2-U2V\)  = 0 

Proof  (b)  Similar  to  (a). 

Since 


and 


||uxv||2=  («2V3-«3V2)2  + («3V1  -«iV3)2+  ( u\V2~U2V\ )2 
||u||2||v||2  - (u  • v)2  = (uj  + u2  4-  U3  J (v2  + v2  + v2 j - (uivi  + U2V2  + ^i)2 


(2) 

(3) 


the  proof  can  be  completed  by  “multiplying  out”  the  right  sides  of  2 and  3 and  verifying  their  equality. 
See  Exercises  38  and  39. 


EXAMPLE  2 u x vis  Perpendicular  to  u and  to  v 


◄ 


Consider  the  vectors 

u=  (1,2,  -2)  and  v=(3,0,  1) 


In  Example  1 we  showed  that 

uxv  = (2,  -7,  -6) 

Since 

u • (ux  v)  = (1)(2)  + (2)(  — 7)  + ( — 2)(  — 6)  = 0 

and 

V • (u  X v)  = (3)  (2)  + (0)  ( - 7)  + (1)  ( - 6)  = 0 
u x v is  orthogonal  to  both  u and  v,  as  guaranteed  by  Theorem  3.5.1. 
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Joseph  Louis  Lagrange  (1736-1813) 

Joseph  Louis  Lagrange  was  a French-Italian  mathematician  and  astronomer.  Although 
his  father  wanted  him  to  become  a lawyer,  Lagrange  was  attracted  to  mathematics  and  astronomy  after 
reading  a memoir  by  the  astronomer  Halley.  At  age  16  he  began  to  study  mathematics  on  his  own  and  by 
age  19  was  appointed  to  a professorship  at  the  Royal  Artillery  School  in  Turin.  The  following  year  he 
solved  some  famous  problems  using  new  methods  that  eventually  blossomed  into  a branch  of  mathematics 
called  the  calculus  of  variations.  These  methods  and  Lagrange's  applications  of  them  to  problems  in 
celestial  mechanics  were  so  monumental  that  by  age  25  he  was  regarded  by  many  of  his  contemporaries  as 
the  greatest  living  mathematician.  One  of  Lagrange's  most  famous  works  is  a memoir,  Mecanique 
Analytique , in  which  he  reduced  the  theory  of  mechanics  to  a few  general  formulas  from  which  all  other 
necessary  equations  could  be  derived.  Napoleon  was  a great  admirer  of  Lagrange  and  showered  him  with 
many  honors.  In  spite  of  his  fame,  Lagrange  was  a shy  and  modest  man.  On  his  death,  he  was  buried  with 
honor  in  the  Pantheon. 

[Image:  ©SSPL/The  Image  Works ] 


The  main  arithmetic  properties  of  the  cross  product  are  listed  in  the  next  theorem. 


Properties  of  Cross  Product 

If  u,  v,  and  w are  any  vectors  in  3-space  and  k is  any  scalar,  then: 
(a)  u x v = - (v  x u) 

(ty  ux(v  + w)  = (uxv)  + (ux w) 

(c)  (u  + v)  X w = (u  X w)  + (v  X w) 

(d)  £(u  x v)  = (kv i)  x v = u x (kv) 

(e)  ux0=0xu  = 0 

(f)  uxu  = 0 


The  proofs  follow  immediately  from  Formula  1 and  properties  of  determinants;  for  example,  part  (a)  can  be  proved 
as  follows. 

Interchanging  u and  v in  1 interchanges  the  rows  of  the  three  determinants  on  the  right  side  of  1 and 
hence  changes  the  sign  of  each  component  in  the  cross  product.  Thus  uxv  = -(vxu). 

The  proofs  of  the  remaining  parts  are  left  as  exercises. 

EXAMPLE  3 Standard  Unit  Vectors 


Consider  the  vectors 

i=  (1,0,0),  j=  (0,1,0),  k=  (0,0,1) 

These  vectors  each  have  length  1 and  lie  along  the  coordinate  axes  (Figure  3.5.1).  They  are  called  the 
standard  unit  vectors  in  3-space.  Every  vector  v = (vj,  V2,  V3)  in  3-space  is  expressible  in  terms  of 
i,  j,  and  k since  we  can  write 

V=  Oi,  V2,  V3)  =vi(l,  0,  0)  +V2(0,  1,  0)  +V3(0,  0,  1)  =vii  + V2]  + V3k 


For  example, 


From  1 we  obtain 


(2,  — 3, 4)  = 2i  — 3j  + 4k 


ixj  = 


(0  0 
^1  0’ 


1 0 
0 0 


1 0 
0 1 


= (0,  0,1)  =k 


k 


t' 

1 (0,0,  I) 


(i.o.  0) 


j 


y 


(0.  1,0) 


The  standard  unit  vectors 


You  should  have  no  trouble  obtaining  the  following  results: 

ixi  = 0 jx  j=  0 kxk  = 0 

ixj  = k jxk  = i kxi  = j 

jxi=  — k kxj=  — i ixk=  — j 

Figure  3.5.2  is  helpful  for  remembering  these  results.  Referring  to  this  diagram,  the  cross  product  of  two 
consecutive  vectors  going  clockwise  is  the  next  vector  around,  and  the  cross  product  of  two  consecutive  vectors 
going  counterclockwise  is  the  negative  of  the  next  vector  around. 


k 


j 


Figure  3.5.2 


Determinant  Form  of  Cross  Product 

It  is  also  worth  noting  that  a cross  product  can  be  represented  symbolically  in  the  form 

i j k 


uxv  = 


u i U2  U2 
vi  V2  V3 


U2 

V2  V3 


l — 


u\  «3 

vi  v3 


J + 


HI  W2 
vi  v2 


For  example,  if  u = (1,  2,  — 2)  and  v = (3,  0,  1),  then 

i i k 

uxv=  1 2 -2 
3 0 1 

which  agrees  with  the  result  obtained  in  Example  1 . 


= 2i  — 7j  — 6k 


(4) 


WARNING 

It  is  not  true  in  general  that  ux(vxw)  = (ux  v)  x w.  For  example, 

i x ( j x j)  = i x 0 = 0 

and 

(i  x j)  x j = k x j = - i 

so 

» X (j  X j)  * (i  X j)  X j 


We  know  from  Theorem  3.5.1  that  u x v is  orthogonal  to  both  u and  v.  If  u and  v are  nonzero  vectors,  it  can  be 
shown  that  the  direction  of  u x v can  be  determined  using  the  following  “right-hand  rule”  (Figure  3.5.3):  Let  0 be 


the  angle  between  u and  v,  and  suppose  u is  rotated  through  the  angle  0 until  it  coincides  with  v.  If  the  fingers  of 
the  right  hand  are  cupped  so  that  they  point  in  the  direction  of  rotation,  then  the  thumb  indicates  (roughly)  the 
direction  of  u x v- 

II  X V 


V 


Figure  3.5.3 

You  may  find  it  instructive  to  practice  this  rule  with  the  products 

i x j = k,  j x k = i,  kxi  = j 


Geometric  Interpretation  of  Cross  Product 

If  u and  v are  vectors  in  3 -space,  then  the  norm  of  u x v has  a useful  geometric  interpretation.  Lagrange's  identity, 
given  in  Theorem  3.5.1,  states  that 


||uxv||2  = ||u||2||v||2-  (u-v)2  (5) 

If  0 denotes  the  angle  between  u and  v,  then  u v = ||u||  ||  v||  cos  0,  so  5 can  be  rewritten  as 

lluxvll2  = ||u||2||v||2-  ||u||2||v||2cos20 
= l|u||2||v||2(l-cos20) 

= ||u||2||v||2sin20 

Since  0 < 0 < tt,  it  follows  that  sin  6 > 0,  so  this  can  be  rewritten  as 

lluxvll  = HI  ||v||sin0  (6) 

But  ||  v||  sin  9 is  the  altitude  of  the  parallelogram  determined  by  u and  v (Figure  3.5.4).  Thus,  from  6,  the  area  ,4  of 
this  parallelogram  is  given  by 

A=  (base)  (altitude)  = ||u||||v||sin0  = ||uxv|| 

This  result  is  even  correct  if  u and  v are  collinear,  since  the  parallelogram  determined  by  u and  v has  zero  area  and 
from  6 we  have  u x v = 0 because  9=0  in  this  case.  Thus  we  have  the  following  theorem. 


Area  of  a Parallelogram 


If,  u and  v are  vectors  in  3 -space,  then  ||u  x v||  is  equal  to  the  area  of  the  parallelogram  determined  by  u 
and  v. 


EXAMPLE  4 Area  of  a Triangle 

Find  the  area  of  the  triangle  determined  by  the  points  P\  (2,  2,  0),  P2(  — 1,  0,  2),  and  ^3(0,  4,  3). 


The  area  A of  the  triangle  is  -i  the  area  of  the  parallelogram  determined  by  the  vectors 
P±P2  and  P\P\  (Figure  3.5.5).  Using  the  method  discussed  in  Example  1 of  Section  3.1, 

P\P2  — (—3,  -2,  2)  andp^p3  = (_2,  2,  3)-  It  follows  that 


P{P2xP{P3=(-\0,  5,  -10) 


(verify)  and  consequently  that 


4 = |||PiP2xPiP3II  = ^(15)  = -^- 


DEFINITION  2 

If  u,  v,  and  w are  vectors  in  3 -space,  then 

u • (v  x w) 

is  called  the  scalar  triple  product  of  u,  v,  and  w. 


Figure  3.5.4 


P&- EO.  2) 


/>*((>.  4.  3) 


y 


Px( 2.  2,  0) 
Figure  3.5.5 


The  scalar  triple  product  of  u = (u\,  112,  ^3),  v = (vi,  V2,  V3),  and  w=  (w\,  m?2,  W3)  can  be  calculated  from  the 
formula 


u • ( v x w)  = 


u 1 U2 
vi  V2  V3 
w\  W2  W3 


This  follows  from  Formula  4 since 

u ■ ( V X w) 


= u 


V2  V3 
M?2  >^3 


1 — 


VI  v3 
w\  W2 


j + 


V2 

v3 

VI 

v3 

x'2 

w3 

U\  - 

Wl 

w3 

u 1 ^2  ^3 

vi  V2  V3 
w\  W2  W3 


W2  + 


vi  v2 

Wl  M?2 
VI  v2 

Wl  M?2 


H 3 


EXAMPLE  5 Calculating  a Scalar  Triple  Product 

Calculate  the  scalar  triple  product  u • (v  x w)  of  the  vectors 

u = 3i  — 2j  — 5k,  v = i + 4j  — 4k,  w=3j  + 2k 


(7) 


Solution  From  7, 


u • ( V X w) 


-4 

2 


+ ( — 5) 


1 4 
0 3 


= 60  4-4  — 15  = 49 


The  symbol  (u  - v)  x w makes  no  sense  because  we  cannot  form  the  cross  product  of  a scalar  and  a 
vector.  Thus,  no  ambiguity  arises  if  we  write  u • v x w rather  than  u • (v  x w) . However,  for  clarity  we  will  usually 
keep  the  parentheses. 


It  follows  from  7 that 


u • (v  x w)  = w ■ (u  x v)  = v ■ (w  x u) 


since  the  3 x 3 determinants  that  represent  these  products  can  be  obtained  from  one  another  by  two  row 
interchanges.  (Verify.)  These  relationships  can  be  remembered  by  moving  the  vectors  u,  v,  and  w clockwise  around 
the  vertices  of  the  triangle  in  Figure  3.5.6. 


11 


w 


V 


Figure  3.5.6 


Geometric  Interpretation  of  Determinants 

The  next  theorem  provides  a useful  geometric  interpretation  of  2 x 2 and  3x3  determinants. 


THEOREM  3.5.4 


(a)  The  absolute  value  of  the  determinant 


u2 

v2 


is  equal  to  the  area  of  the  parallelogram  in  2-space  determined  by  the  vectors  u = (u\,  112)  and 
v = (vi,  V2).  (See  Figure  3.5.7 a) 


(b)  The  absolute  value  of  the  determinant 


det 


u 1 U2 
vi  V2  V3 
w\  w 2 W3 


is  equal  to  the  volume  of  the  parallelepiped  in  3 -space  determined  by  the  vectors  u = (u\,  112, 2*3), 
v = (vi,  V2,  V3),  and  w=  (w\,  m?2,  W3).  (See  Figure  3.5.7 b.) 


Figure  3.5.7 


The  key  to  the  proof  is  to  use  Theorem  3.5.3.  However,  that  theorem  applies  to  vectors  in  3-space, 
whereas  u = (ti\,  U2)  and  v = (y\,  V2)  are  vectors  in  2-space.  To  circumvent  this  “dimension  problem,”  we  will 
view  u and  v as  vectors  in  the  vy-plane  of  an  vyz-coordinate  system  (Figure  3.5.7 c),  in  which  case  these  vectors  are 
expressed  as  u = (u\,  112 , 0)  and  v = (vj,  V2,  0).  Thus 


U X V = 


k 


i 

«1 

vi 


i 

U2 

v2 


k 

0 

0 


« 1 
vi 


det 


« 1 
vi 


"2 

V2 


It  now  follows  from  Theorem  3.5.3  and  the  fact  that  ||k||  = 1 that  the  area  A of  the  parallelogram  determined  by  u 
and  v is 


A=  ||ux  v||  = ||  det 


«1  i*2 
vi  v2 


kll 


lai 

=Hv! 


u2 

v2 


l|k||  = 


det 


«1  U2 

vi  v2 


which  completes  the  proof. 


As  shown  in  Figure  3.5.8,  take  the  base  of  the  parallelepiped  determined  by  u,  v,  and  w to  be  the 
parallelogram  determined  by  v and  w.  It  follows  from  Theorem  3.5.3  that  the  area  of  the  base  is  ||v  x w||  and,  as 
illustrated  in  Figure  3.5.8,  the  height  h of  the  parallelepiped  is  the  length  of  the  orthogonal  projection  of  u on  Vxw 
. Therefore,  by  Formula  12  of  Section  3.3, 


A = IIProJvxwull  = 


|u  • (vxw)| 
||vxw|| 


It  follows  that  the  volume  V of  the  parallelepiped  is 


V = (area  of  base)  • height  = ||v  x w|| 


|u  ■ ( v x w)  | 
||vxw|| 


u • (v  X w) 


so  from  7, 


which  completes  the  proof. 


'ui 

^2 

u2  1 

v= 

det 

vi 

v2 

v3 

wi 

M>2 

vt>3 

Figure  3.5.8 


(8) 


If  V denotes  the  volume  of  the  parallelepiped  determined  by  vectors  u,  v,  and  w,  then  it  follows  from 
Formulas  7 and  8 that 


V = 


volume  of  parallelepiped 
determined  by  u,  v,  and  w 


u • (v  x w) 


(9) 


From  this  result  and  the  discussion  immediately  following  Definition  3 of  Section  3.2,  we  can  conclude  that 

u ■ ( v x w)  = ± V 

where  the  + or  - results  depending  on  whether  u makes  an  acute  or  an  obtuse  angle  with  v x w 


Formula  9 leads  to  a useful  test  for  ascertaining  whether  three  given  vectors  lie  in  the  same  plane.  Since  three 

vectors  not  in  the  same  plane  determine  a parallelepiped  of  positive  volume,  it  follows  from  9 that 

|u  • (v  x w)  | = 0 if  and  only  if  the  vectors  u,  v,  and  w lie  in  the  same  plane.  Thus  we  have  the  following  result. 


THEOREM  3.5.5 


If  the  vectors  u = (u\,  U2,  ^3),  v = (y\,  V2,  V3),  and  w = (w\,  vi?2,  W3)  have  the  same  initial  point,  then 
they  lie  in  the  same  plane  if  and  only  if 


u • (v  x w)  = 


u 1 u 2 
vi  V2  V3 
w\  W2  W3 


= 0 


Concept  Review 

Cross  product  of  two  vectors 
Determinant  form  of  cross  product 
Scalar  triple  product 

Skills 

Compute  the  cross  product  of  two  vectors  u and  v in 
Know  the  geometric  relationship  between  u x v t°  u and  v. 

Know  the  properties  of  the  cross  product  (listed  in  Theorem  3.5.2). 

Compute  the  scalar  triple  product  of  three  vectors  in  3 -space. 

Know  the  geometric  interpretation  of  the  scalar  triple  product. 

Compute  the  areas  of  triangles  and  parallelograms  determined  by  two  vectors  or  three  points  in  2-space 
or  3 -space. 

Use  the  scalar  triple  product  to  determine  whether  three  given  vectors  in  3 -space  are  collinear. 


Exercise  Set  3.5 

In  Exercises  1-2,  let  u = (3,  2,  — 1),  v = (0,  2,  — 3),  and  w=  (2,  6,  7).  Compute  the  indicated  vectors. 

l.(a)  vxw 

(b)  ux  (vxw) 

(c)  (uxv)  xw 


Answer: 


(a)  (32,  -6,-4) 

(b)  (-  14,  -20,  -82) 

(c)  (27,40,  -42) 

2.  (a)  (uxv)x(vxw) 

(b)  u x (v  - 2w) 

(C)  (ux  v)  — 2w 

In  Exercises  3-6,  use  the  cross  product  to  find  a vector  that  is  orthogonal  to  both  u and  v. 

3. u=(  — 6,4,  2),  v=  (3,  1,  5) 

Answer: 

(18,36,  -18) 

4. u=(U,  — 2),  v = (2,  -1,2) 

5.  u = ( — 2,  1,  5),  v = (3,  0,  -3) 

Answer: 

(-3,9,  -3) 

6.  u = (3,  3,  1),  v = (0,  4,  2) 

In  Exercises  7-10,  find  the  area  of  the  parallelogram  determined  by  the  given  vectors  u and  v. 

7. u=  (1,  — 1,  2),  v=  (0,  3,  1) 

Answer: 

{59 

8. u=(3,  — 1,  4),  v=  (6,  -2,8) 

9.  u=  (2,  3,  0),  v = ( — 1,  2,  -2) 

Answer: 

{m 

10. u=(l,l,l),v=(3,2,  -5) 

In  Exercises  11-12,  find  the  area  of  the  parallelogram  with  the  given  vertices. 

1LP1(1,2),JP2(4.4),P3(7,  5),P4(4,  3) 

Answer: 

3 

12.  Pi (3,  2),P2(5,4),P3(9,4),P4(7,  2) 

In  Exercises  13-14,  find  the  area  of  the  triangle  with  the  given  vertices. 

U,A(2,  0),  5(3, 4),  C(  — 1,  2) 


Answer: 


7 

U.A(\,\),B(2,2),C(3,  -3) 

In  Exercises  15-16,  find  the  area  of  the  triangle  in  3-space  that  has  the  given  vertices. 

15. Pi(2,  6,  -\),P2(\,\,\),P3(4,6,2) 

Answer: 

1 374 

2 

16. P(1,  -1.2),  <2(0,  3.  4),  *(6,  1,8) 

In  Exercises  17-18,  find  the  volume  of  the  parallelepiped  with  sides  u,  v,  and  w. 

17.  u = (2,  — 6,  2),  v = (0, 4,  — 2),  w=  (2,  2,  -4) 

Answer: 

16 

18. u=  (3,  1,  2),  v=  (4,  5,  l),w=  (1,  2,4) 

In  Exercises  19-20,  determine  whether  u,  v,  and  w lie  in  the  same  plane  when  positioned  so  that  their  initial 
points  coincide. 

19.  u=  ( - 1,  — 2,  1),  v=  (3,  0,  — 2),  w=  (5,  -4,0) 

Answer: 

The  vectors  do  not  lie  in  the  same  plane. 

20. u=(5,  — 2,  1),  v = (4,  -1,  l),w=(l,  -1,0) 

In  Exercises  21-24,  compute  the  scalar  triple  product  u • (v  x w). 

21.  u=  (-2,0,6),  v=  (1,  -3,1),  w = ( — 5,  -1,1) 

Answer: 

-92 

22.  u = ( — 1.  2,  4),  v=  (3,  4,  -2),  w=(-l,2,5) 

23.  u=  (a,  0,  0),  v=(0,  b,  0),  w=  (0,  0,c) 

Answer: 

abc 

24.  u=  (3,  -1,6),  v=  (2, 4,  3),  w=(5,  -1,2) 

In  Exercises  25-26,  suppose  that  u • (v  x w)  = 3.  Find 

25-(a)  u-  (wxv) 

(b)  (vxw)  -u 
(C)  r(uxv) 


Answer: 


(a)  “3 

(b)  3 

(c)  3 

26*(a)  v-  (uxw) 

(b)  (uxw)  -v 

(c)  v-  (wxw) 

(a)  Find  the  area  of  the  triangle  having  vertices  j4(1,  0,  1),  5(0,  2,  3),  and  C(2,  1,0). 

(b)  Use  the  result  of  part  (a)  to  find  the  length  of  the  altitude  from  vertex  C to  side  AB. 


Answer: 


(a)  ^26_ 

2 

(b)  _/2 6 

3 

28.  Use  the  cross  product  to  find  the  sine  of  the  angle  between  the  vectors  u = (2,  3,  — 6)  and  v = (2,  3,  6) . 

29.  Simplify  (u  + v)  x (u  — v) . 

Answer: 

2 ( v x u) 

30.  Let  a = (a\,  <*2,  ^3),  b = (b  1,  62,  ^3)>  c = (cl>  cl)>  and  d = (^L  <^2>  ^3)-  Show  that 

(add)*  (b  x c)  = a • (b  x c)  + d • (b  x c) 


31.  Let  u,  v,  and  w be  nonzero  vectors  in  3 -space  with  the  same  initial  point,  but  such  that  no  two  of  them  are 
collinear.  Show  that 

(a)  u x (v  x w)  lies  in  the  plane  determined  by  v and  w. 

(b)  (uxv)  x w lies  in  the  plane  determined  by  u and  v. 


32.  Prove  the  following  identities. 

(a)  (u  + kv)  xv  = uxv 

(b)  u ■ (v  x z)  = — (u  x z)  ■ v 


33.  Prove:  If  a,  b,  c,  and  d lie  in  the  same  plane,  then  (a  x b)  x (c  x d)  = 0. 

34.  Prove:  If  0 is  the  angle  between  u and  v and  u - v * 0?  then  tan#  = ||u  x v||  / (u  - v) . 

35.  Show  that  if  u,  v,  and  w are  vectors  in  R , no  two  of  which  are  collinear,  then  u x (v  x w)  lies  in  the  plane 
determined  by  v and  w. 

36.  it  is  a theorem  of  solid  geometry  that  the  volume  of  a tetrahedron  is  y(area  of  base)  • (height).  Use  this  result 


to  prove  that  the  volume  of  a tetrahedron  whose  sides  are  the  vectors  a,  b,  and  c is  7- 

0 


(b  x c) 


(see  the 


accompanying  figure). 


Figure  Ex-36 

37.  Use  the  result  of  Exercise  26  to  find  the  volume  of  the  tetrahedron  with  vertices  P , Q , R , S'. 

(a)  P(- 1,2,0),  <2(2,1,  — 3),  Z?(l,  1,  1),  £(3,  -2,3) 

(b)  P( 0,  0,  0),  (2(1.  2,  - 1),  *(3.  4,  0),  S(  - 1,  - 3, 4) 

Answer: 

(a)  1Z 

6 

(b)  I 

2 

38.  Prove  part  ( d)  of  Theorem  3.5.1.  [Hint:  First  prove  the  result  in  the  case  where  w=  i=  (1,  0,  0),  then  when 

w = j = (0,  1 , 0) , and  then  when  w = k = (0,  0,  1).  Finally,  prove  it  for  an  arbitrary  vector  w = (w  \ , m?2,  W3) 
by  writing  w = w ii  4-  m?2]  4-  vt?3k.] 

39.  Prove  part  (« e ) of  Theorem  3.5.1.  [Hint:  Apply  part  (a)  of  Theorem  3.5.2  to  the  result  in  part  ( d)  of  Theorem 
3.5.1.] 

40.  Prove: 

(a)  Prove  (b)  of  Theorem  3.5.2. 

(b)  Prove  (c)  of  Theorem  3.5.2. 

(c)  Prove  {d)  of  Theorem  3.5.2. 

(d)  Prove  (e)  of  Theorem  3.5.2. 

(e)  Prove  (/)  of  Theorem  3.5.2. 

True-False  Exercises 

In  parts  (a)-(f)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  The  cross  product  of  two  nonzero  vectors  u and  v is  a nonzero  vector  if  and  only  if  u and  v are  not  parallel. 
Answer: 

True 

(b)  A normal  vector  to  a plane  can  be  obtained  by  taking  the  cross  product  of  two  nonzero  and  noncollinear  vectors 
lying  in  the  plane. 

Answer: 

True 

(c)  The  scalar  triple  product  of  u,  v,  and  w determines  a vector  whose  length  is  equal  to  the  volume  of  the 
parallelepiped  determined  by  u,  v,  and  w. 


Answer: 


False 

(d)  If  u and  v are  vectors  in  3 -space,  then  ||v  x u||  is  equal  to  the  area  of  the  parallelogram  determined  by  u and 
Answer: 

True 

(e)  For  all  vectors  u,  v,  and  w in  3-space,  the  vectors  (u x v)  x w and ux  (v x w)  are  the  same. 

Answer: 

False 

(f)  If  u,  v,  and  w are  vectors  in  g},  where  u is  nonzero  and  u x v = u x w?  then  v = w. 

Answer: 

False 
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Supplementary  Exercises 

1.  Let  u = ( — 2,  0,  4),  v = (3,  — 1,  6),  and  w=  (2,  —5,  —5).  Compute 

(a)  3v-2u 

(b)  ||u  + v+w|| 

(c)  the  distance  between  _3U  and  v | 5w 

(d)  Pr°Jwu 

(e)  u - (v  x w) } 

(f)  ( — 5v  + w)  x ((u-  v)w) 

Answer: 

(a)  3v  — 2u=  (13,  -3,10) 

(b)  ||u  + v + w||  = ^70 

(c)  ^774 

(d)  proj„,u  = ” 27  (2>  “ 5-  “ 5 j 

(e)  u • (vxw)  = - 122 

(f)  ( — 5v  4-  w)  x ((u  • v)w)  = ( — 3150,  —2430,1170) 

2.  Repeat  Exercise  1 for  the  vectors  u = 3i  — 5 j 4-  k,  v = — 2i  + 2k,  and  w = — j + 4k. 

3.  Repeat  parts  (a)-(d)  of  Exercise  1 for  the  vectors  u = ( — 2,  6,  2,  1),  v = ( — 3,  0,  8,  0),  and 
w=  (9,  1,  -6,  -6). 


Answer: 

(a)  3v  — 2u  = ( — 5,  -12,20,  -2) 

(b)  ||u  + v + w||  = ^ 106 

(c)  ^2810 

(d)  projwU  = — ^-(9,  1,  -6,  -6) 

4.  Repeat  parts  (a)-(d)  of  Exercise  1 for  the  vectors  u = (0,  5,  0,  — 1,  — 2),  v = (1,  —1,6,  — 2,  0),  and 
w=  ( — 4,  -1,4,  0,2). 

In  Exercises  5-6,  determine  whether  the  given  set  of  vectors  forms  an  orthogonal  set.  If  so,  normalize  each 
vector  to  form  an  orthonormal  set. 


5.  (-32,  -1,19),  (3,  -1,5),  (1,6,  2) 


Answer: 


Not  an  orthogonal  set 


6.  ( — 2,  0,  1),  (1,  1,  2),  (1,  -5,2) 

(a)  The  set  of  all  vectors  in  p?  that  are  orthogonal  to  a nonzero  vector  is  what  kind  of  geometric  object? 

(b)  The  set  of  all  vectors  in  £>-'  that  are  orthogonal  to  a nonzero  vector  is  what  kind  of  geometric  object? 

(c)  The  set  of  all  vectors  in  that  are  orthogonal  to  two  noncollinear  vectors  is  what  kind  of  geometric 

object? 

(d)  The  set  of  all  vectors  in  R-‘  that  are  orthogonal  to  two  noncollinear  vectors  is  what  kind  of  geometric 
object? 

Answer: 


(a)  A line  through  the  origin,  perpendicular  to  the  given  vector. 

(b)  A plane  through  the  origin,  perpendicular  to  the  given  vector. 

(c)  {0}  (the  origin) 

(d)  A line  through  the  origin,  perpendicular  to  the  plane  containing  the  two  noncollinear  vectors. 


m | y 12  2^ 

Show  that  vi  = I — , — , — | and  V2  = | — , — , — — 1 are  orthonormal  vectors,  and  find  a third  vector  V3  for 
which  {vi,  V2,  V3)  is  an  orthonormal  set. 

9 9 9 

9.  True  or  False:  If  u and  v are  nonzero  vectors  such  that  ||u  + v||  = ||u||+  ||  v||  , then  u and  v are 
orthogonal. 


Answer: 


True 

10.  True  or  False:  If  u is  orthogonal  to  v | w,  then  u is  orthogonal  to  v and  w. 

11.  Consider  the  points  P( 3,  — 1,  4),  Q(6,  0,  2),  and  £(5,  1,1).  Find  the  point  S in  whose  first 
component  is  _ 1 and  such  that  PQ  is  parallel  to  fig . 


Answer: 

S(-l.  -1,5) 

12.  Consider  the  points  P(  — 3,  1,  0,  6),  (9(0,  5,  1,  — 2),  and  R(  — 4,  1,  4,  0).  Find  the  point  S in  whose 
third  component  is  6 and  such  that  PQ  is  parallel  to  fig . 

13.  Using  the  points  in  Exercise  11,  find  the  cosine  of  the  angle  between  the  vectors  PQ  and  pfi . 


Answer: 


14.  Using  the  points  in  Exercise  12,  find  the  cosine  of  the  angle  between  the  vectors  PQ  and  pp . 

15.  Find  the  distance  between  the  point  P(  — 3,  1,3)  and  the  plane  5x  | z = 3y  — 4- 


Answer: 


11 

& 

16.  Show  that  the  planes  3x  —y  I 6z  = 7 and  — 6x  I 2y  — 1 2z  = 1 are  parallel,  and  find  the  distance 
between  the  planes. 

In  Exercises  17-22,  find  vector  and  parametric  equations  for  the  line  or  plane  in  question. 

17.  The  plane  in  that  contains  the  points  P(  — 2,  1,  3),  Q(  — 1,  —1,1),  and  P(3,  0,  — 2). 

Answer: 

Vector  equation:  (x,y,z)  = (-2,  l,3)+*i(l,  -2,  -2)+*2(5,  -1,  -5); 

parametric  equations:  x=  -2  + t\  + 5t2,  y = \-  2t\-t2,  z = 3-2t\-  5t2 

18.  The  line  in  that  contains  the  point  P{  — 1,  6,  0)  and  is  orthogonal  to  the  plane  4x 

19.  The  line  in  that  is  parallel  to  the  vector  v = (8,  — 1)  and  contains  the  point  P(0, 

Answer: 

Vector  equation:  (x;>y)  = (0,  -3)  + *(8,  - 1); 

parametric  equations:  * = &,  y = — 3 — t 

20.  The  plane  in  that  contains  the  point  P(  — 2,  1,0)  and  parallel  to  the  plane  _8x  I — z = 4 • 

21.  The  line  in  with  equation  y = 3x  — 5- 

Answer: 

A possible  answer  is  vector  equation:  (*,  y ) = (0,  — 5)  \ 1,3);  parametric  equations: 

x=t,  y = — 5 + 3* 

22.  The  plane  in  fi?  with  equation  2x  — 6y  | 3z  = 5- 

In  Exercises  23-25,  find  a point-normal  equation  for  the  given  plane. 

23.  The  plane  that  is  represented  by  the  vector  equation 
(x,y,z)  = ( — 1,  5,  6)  + *i(0,  -1,3) +*2(2,  -1,0). 

Answer: 

3(x  + 1)  + 6(j>  — 5)  + 2(z—  6)  = 0 

24.  The  plane  that  contains  the  point  P(  — 5,  1,0)  and  is  orthogonal  to  the  line  with  parametric  equations 
x = 3 — 5t,  y = 2t,  and  z = 7- 

25.  The  plane  that  passes  through  the  points  P(9,  0,  4),  Q(  — 1,4,  3),  and  P(0,  6,  — 2). 

Answer: 


— z = 5- 
-3). 


— 18(x  — 9)  — 51jv  — 24(z  — 4)  = 0 


26.  Suppose  that  { vj , V2,  V3 } and  {wj , W2 } are  two  sets  of  vectors  such  that  V£  and  wj  are  orthogonal  for 
all  i and  j.  Prove  that  if  a\,  ai,  ^3,  b\,h>2  are  any  scalars,  then  the  vectors  v = a^vj  + a2V2  + a3V3  and 
w = b jwi  + b2W2  are  orthogonal. 

27.  Prove  that  if  two  vectors  u and  v in  gy  are  orthogonal  to  a nonzero  vector  w in  g},  then  u and  v are  scalar 
multiples  of  each  other. 

28.  Prove  that  ||u  + v ||  = ||  u ||  + ||  v ||  if  and  only  if  u and  v are  parallel  vectors. 

29.  The  equation  Ax*\-By  = 0 represents  a line  through  the  origin  in  g2  if  A and  B are  not  both  zero.  What 
does  this  equation  represent  in  g}  if  you  think  of  it  as  Ax  A By  4-  Oz  = 0?  Explain. 

Answer: 

A plane 
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INTRODUCTION 

Recall  that  we  began  our  study  of  vectors  by  viewing  them  as  directed  line  segments 
(arrows).  We  then  extended  this  idea  by  introducing  rectangular  coordinate  systems,  which 
enabled  us  to  view  vectors  as  ordered  pairs  and  ordered  triples  of  real  numbers.  As  we 
developed  properties  of  these  vectors  we  noticed  patterns  in  various  formulas  that  enabled 
us  to  extend  the  notion  of  a vector  to  an  ^-tuple  of  real  numbers.  Although  w-tuples  took 
us  outside  the  realm  of  our  “visual  experience,”  it  gave  us  a valuable  tool  for 
understanding  and  studying  systems  of  linear  equations.  In  this  chapter  we  will  extend  the 
concept  of  a vector  yet  again  by  using  the  most  important  algebraic  properties  of  vectors 
in  R n as  axioms.  These  axioms,  if  satisfied  by  a set  of  objects,  will  enable  us  to  think  of 
those  objects  as  vectors. 
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4.1  Real  Vector  Spaces 

In  this  section  we  will  extend  the  concept  of  a vector  by  using  the  basic  properties  of  vectors  in  R n as  axioms,  which  if  satisfied 
by  a set  of  objects,  guarantee  that  those  objects  behave  like  familiar  vectors. 


Vector  Space  Axioms 

The  following  definition  consists  often  axioms,  eight  of  which  are  properties  of  vectors  in  Rn  that  were  stated  in  Theorem  3.1.1. 
It  is  important  to  keep  in  mind  that  one  does  not  prove  axioms;  rather,  they  are  assumptions  that  serve  as  the  starting  point  for 
proving  theorems. 

Vector  space  scalars  can  be  real  numbers  or  complex 
numbers.  Vector  spaces  with  real  scalars  are  called  real 
vector  spaces  and  those  with  complex  scalars  are  called 
complex  vector  spaces.  For  now  we  will  be  concerned 
exclusively  with  real  vector  spaces.  We  will  consider 
complex  vector  spaces  later. 


n 


DEFINITION  1 

Let  Fbe  an  arbitrary  nonempty  set  of  obj  ects  on  which  two  operations  are  defined:  addition,  and  multiplication  by 
scalars.  By  addition  we  mean  a rule  for  associating  with  each  pair  of  objects  u and  v in  F an  object  u \ v?  called  the 
sum  of  u and  v;  by  scalar  multiplication  we  mean  a rule  for  associating  with  each  scalar  k and  each  object  u in  Fan 
object  ku,  called  the  scalar  multiple  of  u by  k.  If  the  following  axioms  are  satisfied  by  all  objects  u,  v,  w in  F and  all 
scalars  k and  m , then  we  call  F a vector  space  and  we  call  the  objects  in  V vectors. 

1.  If  u and  v are  objects  in  F,  then  u 4-  v is  in  F 

2.  u-fv  = v + u 

3#  u 4-  (v  + w)  = (u  4-  v)  + w 

4.  There  is  an  object  0 in  F,  called  a zero  vector  for  F,  such  that  0 -f  u = u 4=  0 = u for  all  u in  F. 

5.  For  each  u in  F,  there  is  an  object  _u  in  K called  a negative  of  u,  such  that  u + ( — u)  = ( = u)+u  = 0. 

6.  If  k is  any  scalar  and  u is  any  object  in  F,  then  ku  is  in  F. 

7#  k( u 4-  v)  = ku  4-  kv 

8.  (k  4-  m) u = ku.  4=  rau 

9.  k(m\i)  = (km)(\i) 

10.  lu  = u 


Observe  that  the  definition  of  a vector  space  does  not  specify  the  nature  of  the  vectors  or  the  operations.  Any  kind  of  object  can 
be  a vector,  and  the  operations  of  addition  and  scalar  multiplication  need  not  have  any  relationship  to  those  on  Rn.  The  only 
requirement  is  that  the  ten  vector  space  axioms  be  satisfied.  In  the  examples  that  follow  we  will  use  four  basic  steps  to  show 
that  a set  with  two  operations  is  a vector  space. 

r n 


To  Show  that  a Set  with  Two  Operations  is  a Vector  Space 

Step  1 Identify  the  set  F of  objects  that  will  become  vectors. 


Step  2 Identify  the  addition  and  scalar  multiplication  operations  on  V. 

Step  3 Verify  Axioms  1 and  6;  that  is,  adding  two  vectors  in  V produces  a vector  in  V,  and  multiplying  a vector  in  Vby 
a scalar  also  produces  a vector  in  V.  Axiom  1 is  called  closure  under  addition , and  Axiom  6 is  called  closure  under 
scalar  multiplication. 

Step  4 Confirm  that  Axioms  2,  3,  4,  5,  7,  8,  9,  and  10  hold. 


J 


Hermann  Gunther  Grassmann  (1809-1877) 

The  notion  of  an  “abstract  vector  space”  evolved  over  many  years  and  had  many  contributors.  The 
idea  crystallized  with  the  work  of  the  German  mathematician  H.  G.  Grassmann,  who  published  a paper  in  1 862  in  which 
he  considered  abstract  systems  of  unspecified  elements  on  which  he  defined  formal  operations  of  addition  and  scalar 
multiplication.  Grassmann’ s work  was  controversial,  and  others,  including  Augustin  Cauchy  (p.  137),  laid  reasonable 
claim  to  the  idea. 
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Our  first  example  is  the  simplest  of  all  vector  spaces  in  that  it  contains  only  one  object.  Since  Axiom  4 requires  that  every 
vector  space  contain  a zero  vector,  the  object  will  have  to  be  that  vector. 

EXAMPLE  1 The  Zero  Vector  Space 

Let  V consist  of  a single  object,  which  we  denote  by  0,  and  define 

0 + 0 = 0 and  £0  = 0 

for  all  scalars  k.  It  is  easy  to  check  that  all  the  vector  space  axioms  are  satisfied.  We  call  this  the  zero  vector 
space. 


Our  second  example  is  one  of  the  most  important  of  all  vector  spaces — the  familiar  space  Rn.  It  should  not  be  surprising  that 
the  operations  on  Rn  satisfy  the  vector  space  axioms  because  those  axioms  were  based  on  known  properties  of  operations  on  R n 


EXAMPLE  2 Rn  Is  a Vector  Space 

Let  V = Rn,  and  define  the  vector  space  operations  on  V to  be  the  usual  operations  of  addition  and  scalar 
multiplication  of  w-tuples;  that  is, 

u + v = (u\,U2,  —,«M)  + (vi,  V2, v„)  = (hi  + vi,  + V2, un  + v„) 

hi  = (ku\,  kii2,  hin) 

The  set  V = Rn  is  closed  under  addition  and  scalar  multiplication  because  the  foregoing  operations  produce 


^-tuples  as  their  end  result,  and  these  operations  satisfy  Axioms  2,  3,  4,  5,  7,  8,  9,  and  10  by  virtue  of  Theorem 
3.1.1. 


Our  next  example  is  a generalization  of  Rn  in  which  we  allow  vectors  to  have  infinitely  many  components. 

EXAMPLE  3 The  Vector  Space  of  Infinite  Sequences  of  Real  Numbers 

Let  V consist  of  objects  of  the  form 

in  which  u\,  112 , un, ....  is  an  infinite  sequence  of  real  numbers.  We  define  two  infinite  sequences  to  be  equal  if 

their  corresponding  components  are  equal,  and  we  define  addition  and  scalar  multiplication  componentwise  by 

u + v = (ki,«2 + (vi,v2 V„...) 

= (tti+v1,a2  + v2 «m  + vm,—) 

ku  = (ku\,  ku2,  ...) 

We  leave  it  as  an  exercise  to  confirm  that  V with  these  operations  is  a vector  space.  We  will  denote  this  vector 
space  by  the  symbol  R x . 


In  the  next  example  our  vectors  will  be  matrices.  This  may  be  a little  confusing  at  first  because  matrices  are  composed  of  rows 
and  columns,  which  are  themselves  vectors  (row  vectors  and  column  vectors).  However,  here  we  will  not  be  concerned  with  the 
individual  rows  and  columns  but  rather  with  the  properties  of  the  matrix  operations  as  they  relate  to  the  matrix  as  a whole. 

Note  that  Equation  1 involves  three  different  addition 
operations:  the  addition  operation  on  vectors,  the 
addition  operation  on  matrices,  and  the  addition 
operation  on  real  numbers. 


EXAMPLE  4 A Vector  Space  of  2 x 2 Matrices 


Let  Fbe  the  set  of  2 x 2 matrices  with  real  entries,  and  take  the  vector  space  operations  on  V to  be  the  usual 
operations  of  matrix  addition  and  scalar  multiplication;  that  is, 


u 4-  v = 


“11  “12 
“21  “22 


+ 


vil 

V21 


v12 

V22 


“ii+vn  W12  + V12 

W21+V21  W22  + V22 


'«11  «12' 

ten  tei2 

u2l  u22 

te2i  te22 

(i) 


The  set  V is  closed  under  addition  and  scalar  multiplication  because  the  foregoing  operations  produce  2x2 
matrices  as  the  end  result.  Thus,  it  remains  to  confirm  that  Axioms  2,  3,  4,  5,  7,  8,  9,  and  10  hold.  Some  of  these 
are  standard  properties  of  matrix  operations.  For  example,  Axiom  2 follows  from  Theorem  1.4.1a  since 


u + v = 


“11  “12 
“21  “22 


vil  V12 
V21  v22 


"vil 

vn 

4- 

■«ii 

“12 

V21 

v22_ 

“21 

“22_ 

Similarly,  Axioms  3,  7,  8,  and  9 follow  from  parts  ( b ),  (h),  (/),  and  ( e ),  respectively,  of  that  theorem  (verify).  This 
leaves  Axioms  4,  5,  and  10  that  remain  to  be  verified. 


To  confirm  that  Axiom  4 is  satisfied,  we  must  find  a 2 x 2 matrix  0 in  V for  which  u 4=  0 = 0 -h  u for  all  2 x 2 
matrices  in  V.  We  can  do  this  by  taking 

„ |"0  01 


With  this  definition, 


0 + u 


0 0 
0 0 


’“11 

“12" 

’“11 

“12‘ 

“21 

“22  _ 

“21 

“22  _ 

and  similarly  u + 0 = u - To  verify  that  Axiom  5 holds  we  must  show  that  each  object  u in  Fhas  a negative  in 

V such  that  u + ( — u)  = 0 and  ( — u)  + u = 0.  This  can  be  done  by  defining  the  negative  of  u to  be 


““11  ““12 
““21  ““22 


With  this  definition, 

»+(-»)  = [“ 

and  similarly  (-u)  + u = 0.  Finally,  Axiom  10  holds  because 


hi 

“21 


“12' 

+ 

'-“11 

~“12’ 

'0 

O' 

“22 

-“21 

-“22 

_0 

0_ 

lu  = 1 


"“11 

“12' 

'“11 

“12' 

“21 

“22  _ 

“21 

“22  _ 

= 0 


EXAMPLE  5 The  Vector  Space  of  m x n Matrices 

Example  4 is  a special  case  of  a more  general  class  of  vector  spaces.  You  should  have  no  trouble  adapting  the 
argument  used  in  that  example  to  show  that  the  set  V of  all  m x n matrices  with  the  usual  matrix  operations  of 
addition  and  scalar  multiplication  is  a vector  space.  We  will  denote  this  vector  space  by  the  symbol  Mmn.  Thus, 
for  example,  the  vector  space  in  Example  4 is  denoted  as  M 22- 


In  Example  6 the  functions  were  defined  on  the  entire 
interval  ( — oo  , oo  ) . However,  the  arguments  used  in 
that  example  apply  as  well  on  all  subin-tervals  of 
( — oo  , oo  ) , such  as  a closed  interval  [a,  b ] or  an  open 
interval  (a,  b).  We  will  denote  the  vector  spaces  of 
functions  on  these  intervals  by  F\a,  b\  and  F(a,  b), 
respectively. 

EXAMPLE  6 The  Vector  Space  of  Real-Valued  Functions 

Let  Fbe  the  set  of  real- valued  functions  that  are  defined  at  each  x in  the  interval  ( — oo  , oo  ) . If  f = / (x)  and 
g = g(x)  are  two  functions  in  V and  if  k is  any  scalar,  then  define  the  operations  of  addition  and  scalar 
multiplication  by 


(f+g ) 00  =/(*)+*(*)  (2) 

(*f)00=*/(x)  (3) 

One  way  to  think  about  these  operations  is  to  view  the  numbers /(*)  and  g(x)  as  “components”  of  f and  g at  the 
point  x,  in  which  case  Equations  2 and  3 state  that  two  functions  are  added  by  adding  corresponding  components, 
and  a function  is  multiplied  by  a scalar  by  multiplying  each  component  by  that  scalar — exactly  as  in  Rn  and  R 00 . 
This  idea  is  illustrated  in  parts  ( a ) and  (b)  of  Figure  4.1.1.  The  set  V with  these  operations  is  denoted  by  the 
symbol  F(  — oo  , oo  ) . We  can  prove  that  this  is  a vector  space  as  follows: 


Axioms  1 and  6 These  closure  axioms  require  that  if  we  add  two  functions  that  are  defined  at  each  x in  the 
interval  ( — oo  , 00  ) , then  sums  and  scalar  multiples  of  those  functions  are  also  defined  at  each  x in  the  interval 
(=  00 , 00  ) . This  follows  from  Formulas  2 and  3. 

Axiom  4 This  axiom  requires  that  there  exists  a function  0 in  F ( — 00  , 00),  which  when  added  to  any  other 
function  f in  F ( — 00  , 00)  produces  f back  again  as  the  result.  The  function,  whose  value  at  every  point  x in  the 
interval  ( — 00  , oo  ) is  zero,  has  this  property.  Geometrically,  the  graph  of  the  function  0 is  the  line  that 
coincides  with  the  x-axis. 

Axiom  5 This  axiom  requires  that  for  each  function  fin  F ( — 00  , 00  ) there  exists  a function  — f in 

F(  — 00 , 00  ),  which  when  added  to  f produces  the  function  0.  The  function  defined  by  — f (x) = — / (x)  has 

this  property.  The  graph  of  _f  can  be  obtained  by  reflecting  the  graph  of  f about  the  x-axis  (Figure  4. 1.1c). 

Axioms  2,3,7,8,9,10  The  validity  of  each  of  these  axioms  follows  from  properties  of  real  numbers.  For  example, 
if  f and  g are  functions  in  F ( — 00  , 00  ) , then  Axiom  2 requires  that  f + g = g + f . This  follows  from  the 
computation 

(f  + g)  (X)  = f (x)  + g(x)  = g(x)  + f OO  = (g  + f ) (X) 

in  which  the  first  and  last  equalities  follow  from  2,  and  the  middle  equality  is  a property  of  real  numbers.  We  will 
leave  the  proofs  of  the  remaining  parts  as  exercises. 


f 

0 

-f 


/(*) 

-/(*> 


(c) 


It  is  important  to  recognize  that  you  cannot  impose  any  two  operations  on  any  set  V and  expect  the  vector  space  axioms  to  hold. 
For  example,  if  V is  the  set  of  22 -tuples  withpositive  components,  and  if  the  standard  operations  from  Rn  are  used,  then  V is  not 
closed  under  scalar  multiplication,  because  if  u is  a nonzero  /2-tuple  in  V,  then  ( — l)u  has  at  least  one  negative  component  and 
hence  is  not  in  V.  The  following  is  a less  obvious  example  in  which  only  one  of  the  ten  vector  space  axioms  fails  to  hold. 

EXAMPLE  7 A Set  That  Is  Not  a Vector  Space 

Let  y = p and  define  addition  and  scalar  multiplication  operations  as  follows:  If  u = («i,  U2)  and  v = (vj.  V2) 

, then  define 

u + v=  (u\  4=  vi,  U2  4-  V2) 

and  if  k is  any  real  number,  then  define 

•hi  = (tei,  0) 

For  example,  if  u—  (2,  4),  v = (—3,  5),  and  £ = 7?  then 

u + v=(2  + (-3),4  + 5)  = (-l,9) 
te  = 7u  = (7  • 2,  0)  = (14,  0) 

The  addition  operation  is  the  standard  one  from  but  the  scalar  multiplication  is  not.  In  the  exercises  we  will 
ask  you  to  show  that  the  first  nine  vector  space  axioms  are  satisfied.  However,  Axiom  10  fails  to  hold  for  certain 
vectors.  For  example,  if  u = (ti\,  U2)  is  such  that  U2  * 0,  then 

lu=  l(«i,  U2)  = (1  ■ u\7  0)  = (u\,  0)  *11 
Thus,  V is  not  a vector  space  with  the  stated  operations. 


Our  final  example  will  be  an  unusual  vector  space  that  we  have  included  to  illustrate  how  varied  vector  spaces  can  be.  Since  the 
objects  in  this  space  will  be  real  numbers,  it  will  be  important  for  you  to  keep  track  of  which  operations  are  intended  as  vector 
operations  and  which  ones  as  ordinary  operations  on  real  numbers. 

EXAMPLE  8 An  Unusual  Vector  Space 

Let  Fbe  the  set  of  positive  real  numbers,  and  define  the  operations  on  V to  be 

u + v = uv  [Vector  addition  is  numerical  multiplication.  ] 

ku  = u * [Sc  alar  multiplic  ation  is  numeric  al  exp  onentiation.  ] 

2 

Thus,  for  example,  1 + 1 = 1 and  (2)  ( 1 ) = 1 = 1 — strange  indeed,  but  nevertheless  the  set  V with  these 

operations  satisfies  the  10  vector  space  axioms  and  hence  is  a vector  space.  We  will  confirm  Axioms  4,  5,  and  7, 
and  leave  the  others  as  exercises. 

Axiom  4 — The  zero  vector  in  this  space  is  the  number  1 (i.e.,  0=1)  since 

u ^ \ =u • \ =u 

Axiom  5 — The  negative  of  a vector  u is  its  reciprocal  (i.e.,  — u = 1 / u)  since 
• Axiom  7 — k(u  4-  v)  = (uv) k = u*vk  = (ku)  + (£v) 


Some  Properties  of  Vectors 

The  following  is  our  first  theorem  about  general  vector  spaces.  As  you  will  see,  its  proof  is  very  formal  with  each  step  being 
justified  by  a vector  space  axiom  or  a known  property  of  real  numbers.  There  will  not  be  many  rigidly  formal  proofs  of  this  type 
in  the  text,  but  we  have  included  these  to  reinforce  the  idea  that  the  familiar  properties  of  vectors  can  all  be  derived  from  the 
vector  space  axioms. 


THEOREM  4.1.1 

Let  Lbe  a vector  space,  u a vector  in  V,  and  k a scalar;  then: 

(a)  0u  = 0 

(b)  to  = 0 

(c)  (-!)«=  -« 

(d)  If  £u  = 0,  then  k = 0 or  u = 0- 


We  will  prove  parts  {a)  and  ( c ) and  leave  proofs  of  the  remaining  parts  as  exercises. 
We  can  write 


Ou  + Ou  = (0  4-  0)u  [ Axiom  8 ] 

= Ou  [Property  of  the  number  0 ] 


By  Axiom  5 the  vector  Ou  has  a negative,  — Ou-  Adding  this  negative  to  both  sides  above  yields 

[0u-|-  Ou]  + (— Ou)  = 0u+  (“Ou) 
or 

0u+  [0u+  ( — Ou)]  = 0u+  (—Ou)  [Axiom  3] 

Ou  + 0 = 0 [Axiom  5 ] 

0u  = 0 [Axiom  4] 


To  prove  that  ( = 1 )u  = -u,  we  must  show  that  u+(  = l)u  = 0.  The  proof  is  as  follows: 


u+(  — l)u  = lu+(  — l)u 
= (1  + (“l))u 
= Ou 

= 0 


[Axiom  10] 

[Axiom  8] 

[Property  of  numbers] 
[Part  (a)  of  this  theorem] 


A Closing  Observation 

This  section  of  the  text  is  very  important  to  the  overall  plan  of  linear  algebra  in  that  it  establishes  a common  thread  between 
such  diverse  mathematical  objects  as  geometric  vectors,  vectors  in  Rn,  infinite  sequences,  matrices,  and  real- valued  functions, 
to  name  a few.  As  a result,  whenever  we  discover  a new  theorem  about  general  vector  spaces,  we  will  at  the  same  time  be 
discovering  a theorem  about  geometric  vectors,  vectors  in  Rn,  sequences,  matrices,  real-valued  functions,  and  about  any  new 
kinds  of  vectors  that  we  might  discover. 

To  illustrate  this  idea,  consider  what  the  rather  innocent-looking  result  in  part  (a)  of  Theorem  4.1.1  says  about  the  vector  space 
in  Example  8.  Keeping  in  mind  that  the  vectors  in  that  space  are  positive  real  numbers,  that  scalar  multiplication  means 
numerical  exponentiation,  and  that  the  zero  vector  is  the  number  1,  the  equation 

0u  = 0 

is  a statement  of  the  fact  that  if  u is  a positive  real  number,  then 

u°=\ 


Concept  Review 

Vector  space 

Closure  under  addition 

Closure  under  scalar  multiplication 

Examples  of  vector  spaces 

Skills 

Determine  whether  a given  set  with  two  operations  is  a vector  space. 

Show  that  a set  with  two  operations  is  not  a vector  space  by  demonstrating  that  at  least  one  of  the  vector  space  axioms 
fails. 


Exercise  Set  4.1 


1.  Let  Fbe  the  set  of  all  ordered  pairs  of  real  numbers,  and  consider  the  following  addition  and  scalar  multiplication  operations 
onu=  (u\,  U2)  and  v=  (vj,  V2): 

u-f  v=  4-  v\,  U2  + V2),  &u=(0,&W2) 

(a)  Compute  u -h  v and  ku  for  u = ( — 1 , 2) , v = (3,4)  and  k = 3- 

(b)  In  words,  explain  why  V is  closed  under  addition  and  scalar  multiplication. 

(c)  Since  addition  on  V is  the  standard  addition  operation  on  £2,  certain  vector  space  axioms  hold  for  V because  they  are 
known  to  hold  for  g?.  Which  axioms  are  they? 

(d)  Show  that  Axioms  7,  8,  and  9 hold. 

(e)  Show  that  Axiom  10  fails  and  hence  that  V is  not  a vector  space  under  the  given  operations. 

Answer: 

(a)  u -h  v = (2,  6),  3u  = (0,  6) 

(c)  Axioms  1-5 

2.  Let  Fbe  the  set  of  all  ordered  pairs  of  real  numbers,  and  consider  the  following  addition  and  scalar  multiplication  operations 
onu=  (u\,  112)  and  v=  (vi,  V2): 

u-f  v=  (u\  + vi  + 1,  U2  4=  V2  + 1),  £u=(fei,te2) 

(a)  Compute  u | v and  ku  for  u = (0,  4),  v = (1,  — 3),  and  k = 2- 

(b)  Show  that  (0,  0)  * 0. 

(c)  Show  that  ( — 1,  — 1)  = 0. 

(d)  Show  that  Axiom  5 holds  by  producing  an  ordered  pair  _u  such  that  u + (—11)  = 0 for  u = (ti\,  112) . 

(e)  Find  two  vector  space  axioms  that  fail  to  hold. 

In  Exercises  3-12,  determine  whether  each  set  equipped  with  the  given  operations  is  a vector  space.  For  those  that  are  not 
vector  spaces  identify  the  vector  space  axioms  that  fail. 

3.  The  set  of  all  real  numbers  with  the  standard  operations  of  addition  and  multiplication. 

Answer: 

The  set  is  a vector  space  with  the  given  operations. 

4.  The  set  of  all  pairs  of  real  numbers  of  the  form  (x,  0)  with  the  standard  operations  on 

5.  The  set  of  all  pairs  of  real  numbers  of  the  form  (x,  y),  where  x > 0,  with  the  standard  operations  on  g}. 

Answer: 

Not  a vector  space,  Axioms  5 and  6 fail. 

6.  The  set  of  all  /2-tuples  of  real  numbers  that  have  the  form  (x,  x, x)  with  the  standard  operations  on  Rn. 

7.  The  set  of  all  triples  of  real  numbers  with  the  standard  vector  addition  but  with  scalar  multiplication  defined  by 

k{x , y,  z)  = (k2x , k2y,  k2z^j 

Answer: 

Not  a vector  space.  Axiom  8 fails. 

8.  The  set  of  all  2 x 2 invertible  matrices  with  the  standard  matrix  addition  and  scalar  multiplication. 

9.  The  set  of  all  2 x 2 matrices  of  the  form 

r«  01 


with  the  standard  matrix  addition  and  scalar  multiplication. 

Answer: 

The  set  is  a vector  space  with  the  given  operations. 

10.  The  set  of  all  real- valued  functions / defined  everywhere  on  the  real  line  and  such  that  / (1)  = 0 with  the  operations  used  in 
Example  6. 

11.  The  set  of  all  pairs  of  real  numbers  of  the  form  (1,  x)  with  the  operations 

(1.7)  + (1./)=  (1.7+/)  and *(1,7)  = (1,*7) 

Answer: 

The  set  is  a vector  space  with  the  given  operations. 

12.  The  set  of  polynomials  of  the  form  aq  4-  with  the  operations 

(ao+tfix)  + (&0  + &1*)  = (tfo  +i>o)  + 0*1  +^l)* 

and 

k(a$  +a\x)  = (fo*o)  + (ka\)x 

13.  Verify  Axioms  3,  7,  8,  and  9 for  the  vector  space  given  in  Example  4. 

14.  Verify  Axioms  1,  2,  3,  7,  8,  9,  and  10  for  the  vector  space  given  in  Example  6. 

15.  With  the  addition  and  scalar  multiplication  operations  defined  in  Example  7,  show  that  y — $}  satisfies  Axioms  1-9. 

16.  Verify  Axioms  1,  2,  3,  6,  8,  9,  and  10  for  the  vector  space  given  in  Example  8. 

17.  Show  that  the  set  of  all  points  in  g}  lying  on  a line  is  a vector  space  with  respect  to  the  standard  operations  of  vector 
addition  and  scalar  multiplication  if  and  only  if  the  line  passes  through  the  origin. 

18.  Show  that  the  set  of  all  points  in  p}  lying  in  a plane  is  a vector  space  with  respect  to  the  standard  operations  of  vector 
addition  and  scalar  multiplication  if  and  only  if  the  plane  passes  through  the  origin. 

In  Exercises  19-21,  prove  that  the  given  set  with  the  stated  operations  is  a vector  space. 

19.  The  set  V — {0}  with  the  operations  of  addition  and  scalar  multiplication  given  in  Example  1. 

20.  The  set  R x of  all  infinite  sequences  of  real  numbers  with  the  operations  of  addition  and  scalar  multiplication  given  in 
Example  3. 

21.  The  set  Mmn  of  all  ^ x n matrices  with  the  usual  operations  of  addition  and  scalar  multiplication. 

22.  Prove  part  ( d)  of  Theorem  4.1.1. 

23.  The  argument  that  follows  proves  that  if  u,  v,  and  w are  vectors  in  a vector  space  V such  that  u | w = v | then  u = v 
(the  cancellation  law  for  vector  addition).  As  illustrated,  justify  the  steps  by  filling  in  the  blanks. 

u + w = v 4-  w Hypothesis 

(u  + w)  + ( — w)  = (v  4-  w)  4-  ( — w)  Add— w to  both  sides. 

u 4-  [w4-  (— w)]  = v + [w-b  (— w)]  

u T 0 = v 4-  0 

u = v 

24.  Let  v be  any  vector  in  a vector  space  V.  Prove  that  Qv  = 0 

25.  Below  is  a seven-step  proof  of  part  ( b ) of  Theorem  4.1.1.  Justify  each  step  either  by  stating  that  it  is  true  by  hypothesis  or  by 
specifying  which  of  the  ten  vector  space  axioms  applies. 

Hypothesis:  Let  u be  any  vector  in  a vector  space  V,  let  0 be  the  zero  vector  in  V,  and  let  k be  a scalar. 


Conclusion:  Then  ^0  = 0- 


Proof: 


(1) A0  + Au  = £(0  + u 

(2)  =ku 

(3)  Since  An  is  in  V,  -An  is  in  V. 

(4)  Therefore,  (AD  + An  + (-An  = An  + (-An). 

(5)  AO  + (An  + (-An))  = An  + (-An) 

(6)  AO  + 0 = 0 

(7)  AO  = 0 

26.  Let  v be  any  vector  in  a vector  space  V.  Prove  that  — v = (—  l)v. 

27.  Prove:  If  u is  a vector  in  a vector  space  V and  k a scalar  such  that  An  = 0?  then  either  k = 0 or  u = Q.  [ Suggestion : Show 
that  if  An  = 0 and  k =£  0?  then  u = 0-  The  result  then  follows  as  a logical  consequence  of  this.] 

True-False  Exercises 

In  parts  (a)-(e)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  A vector  is  a directed  line  segment  (an  arrow). 

Answer: 

False 

(b)  A vector  is  an  /2-tuple  of  real  numbers. 

Answer: 

False 

(c)  A vector  is  any  element  of  a vector  space. 

Answer: 

True 

(d)  There  is  a vector  space  consisting  of  exactly  two  distinct  vectors. 

Answer: 

False 

(e)  The  set  of  polynomials  with  degree  exactly  1 is  a vector  space  under  the  operations  defined  in  Exercise  12. 

Answer: 

False 
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4.2  Subspaces 

It  is  possible  for  one  vector  space  to  be  contained  within  another.  We  will  explore  this  idea  in  this  section,  we 
will  discuss  how  to  recognize  such  vector  spaces,  and  we  will  give  a variety  of  examples  that  will  be  used  in 
our  later  work. 

We  will  begin  with  some  terminology. 


DEFINITION  1 

A subset  Ik  of  a vector  space  V is  called  a subspace  of  V if  W is  itself  a vector  space  under  the  addition 
and  scalar  multiplication  defined  on  V. 


In  general,  to  show  that  a nonempty  set  W with  two  operations  is  a vector  space  one  must  verify  the  ten  vector 
space  axioms.  However,  if  Ik  is  a subspace  of  a known  vector  space  k,  then  certain  axioms  need  not  be  verified 
because  they  are  “inherited”  from  V.  For  example,  it  is  not  necessary  to  verify  that  u ) v = v | u holds  in  W 
because  it  holds  for  all  vectors  in  V including  those  in  Ik.  On  the  other  hand,  it  is  necessary  to  verify  that  W is 
closed  under  addition  and  scalar  multiplication  since  it  is  possible  that  adding  two  vectors  in  W or  multiplying  a 
vector  in  Ikby  a scalar  produces  a vector  in  V that  is  outside  of  Ik  (Figure  4.2.1). 


The  vectors  u and  v are  in  Ik,  but  the  vectors  u | v and  ki i are  not 

Those  axioms  that  are  not  inherited  by  W are 

Axiom  1 — Closure  of  W under  addition 

Axiom  4 — Existence  of  a zero  vector  in  Ik 

Axiom  5 — Existence  of  a negative  in  Ik  for  every  vector  in  W 

Axiom  6 — Closure  of  W under  scalar  multiplication 

so  these  must  be  verified  to  prove  that  it  is  a subspace  of  V.  However,  the  following  theorem  shows  that  if 
Axiom  1 and  Axiom  6 hold  in  Ik,  then  Axioms  4 and  5 hold  in  Ik  as  a consequence  and  hence  need  not  be 
verified. 


THEOREM  4.2.1 


If  W is  a set  of  one  or  more  vectors  in  a vector  space  F,  then  IF  is  a subspace  of  F if  and  only  if  the 
following  conditions  hold. 

(a)  If  u and  v are  vectors  in  W \ then  u | v is  in  IF. 

(b)  If  k is  any  scalar  and  u is  any  vector  in  W , then  ki i is  in  IF. 

In  words,  Theorem  4.2.1  states  that  W is  a 
subspace  of  F if  and  only  if  it  is  closed  under 
addition  and  scalar  multiplication. 

If  IF  is  a subspace  of  F,  then  all  the  vector  space  axioms  hold  in  IF,  including  Axioms  1 and  6,  which 
are  precisely  conditions  {a)  and  ( b ). 

Conversely,  assume  that  conditions  (a)  and  ( b ) hold.  Since  these  are  Axioms  1 and  6,  and  since  Axioms  2,  3,  7, 
8,  9,  and  10  are  inherited  from  F,  we  only  need  to  show  that  Axioms  4 and  5 hold  in  IF.  For  this  purpose,  let  u 
be  any  vector  in  IF.  It  follows  from  condition  ( b ) that  ku  is  a vector  in  W for  every  scalar  k.  In  particular, 

Qu  = 0 and  (“l)u  = — u are  in  IF,  which  shows  that  Axioms  4 and  5 hold  in  IF. 

Note  that  every  vector  space  has  at  least  two 
subspaces,  itself  and  its  zero  subspace. 


EXAMPLE  1 The  Zero  Subspace 

If  V is  any  vector  space,  and  if  W=  {0 } is  the  subset  of  V that  consists  of  the  zero  vector  only, 
then  W is  closed  under  addition  and  scalar  multiplication  since 

0 | 0 = 0 and  £0  = 0 

for  any  scalar  k.  We  call  W the  zero  subspace  of  V. 


EXAMPLE  2 Lines  Through  the  Origin  Are  Subspaces  of  R2  and  of  R3 

If  IF  is  a line  through  the  origin  of  either  p/  or  then  adding  two  vectors  on  the  line  W or  multiplying  ; 
on  the  line  IF  by  a scalar  produces  another  vector  on  the  line  IF,  so  W is  closed  under  addition  and  scalar 
multiplication  (see  Figure  4.2.2  for  an  illustration  in  p-'). 


(a)  W is  closed  under  addition.  ( b ) W is  closed  under  scalar 

multiplication. 


Figure  4.2.2 


EXAMPLE  3 Planes  Through  the  Origin  AreSubspaces  of  f?3 

If  u and  v are  vectors  in  a plane  W through  the  origin  of  R then  it  is  evident  geometrically  that  u | v 
and  ku  lie  in  the  same  plane  W for  any  scalar  k (Figure  4.2.3).  Thus  W is  closed  under  addition  and 
scalar  multiplication. 


The  vectors  u | v and  ku  both  lie  in  the  same  plane  as  u and  v 

Table  1 that  follows  gives  a list  of  subspaces  of  r}  and  of  R-'  that  we  have  encountered  thus  far.  We  will  see 
later  that  these  are  the  only  subspaces  of  r}  and  of  R-\ 


Table  1 


Subspaces  of/?2 

Subspaces  of/?3 

* {0} 

* {0} 

• Lines  through  the  origin 

• Lines  through  the  origin 

• R2 

• Planes  through  the  origin 

• R 3 

EXAMPLE  4 A Subset  of  R2  That  Is  Not  a Subspace 


Let  Wbc  the  set  of  all  points  (. x , y)  in  for  which  x > 0 and  y > 0 (the  shaded  region  in  Figure 

4.2.4).  This  set  is  not  a subspace  of  ft}  because  it  is  not  closed  under  scalar  multiplication.  For 
example,  v=(l,  l)isa  vector  in  W,  but  ( — 1)  v = ( — 1,  — 1)  is  not. 


>' 

iv 


(l.  I) 


X 


-► 


(-1,-0 


W is  not  closed  under  scalar  multiplication 


EXAMPLE  5 Subspaces  of  Mnn 

We  know  from  Theorem  1.7.2  that  the  sum  of  two  symmetric  nxn  matrices  is  symmetric  and 
that  a scalar  multiple  of  a symmetric  n x n matrix  is  symmetric.  Thus,  the  set  of  symmetric  nxn 
matrices  is  closed  under  addition  and  scalar  multiplication  and  hence  is  a subspace  of  Mnn. 
Similarly,  the  sets  of  upper  triangular  matrices,  lower  triangular  matrices,  and  diagonal  matrices 
are  subspaces  of  Mnn. 


EXAMPLE  6 A Subset  of  Mnn  That  Is  Not  a Subspace 


The  set  W of  invertible  nxn  matrices  is  not  a subspace  of  Mnn,  failing  on  two  counts — it  is  not 
closed  under  addition  and  not  closed  under  scalar  multiplication.  We  will  illustrate  this  with  an 
example  in  M22  that  you  can  readily  adapt  to  Mnn.  Consider  the  matrices 


U = 


1 2 

2 5 


and  V = 


-1  2 

-2  5 


The  matrix  Of/  is  the  2 x 2 zero  matrix  and  hence  is  not  invertible,  and  the  matrix  JJ  \ V has  a 
column  of  zeros,  so  it  also  is  not  invertible. 


CALCULUS  REQUIRED 

EXAMPLE  7 The  Subspace  C(-«  «) 

There  is  a theorem  in  calculus  which  states  that  a sum  of  continuous  functions  is  continuous  and 
that  a constant  times  a continuous  function  is  continuous.  Rephrased  in  vector  language,  the  set 
of  continuous  functions  on  ( — 00  , 00  ) is  a subspace  of  F ( — 00  , 00  ) . We  will  denote  this 


subspace  by  C(  — oo  , oo  ) . 


CALCULUS  REQUIRED 

EXAMPLE  8 Functions  with  Continuous  Derivatives 

A function  with  a continuous  derivative  is  said  to  be  continuously  differentiable.  There  is  a 
theorem  in  calculus  which  states  that  the  sum  of  two  continuously  differentiable  functions  is 
continuously  differentiable  and  that  a constant  times  a continuously  differentiable  function  is 
continuously  differentiable.  Thus,  the  functions  that  are  continuously  differentiable  on 
( — oo , oo  ) form  a subspace  of  F ( — oo  , oo  ) . We  will  denote  this  subspace  by 
C*  (—  oo  , oo  ),  where  the  superscript  emphasizes  that  the  first  derivative  is  continuous.  To  take 

this  a step  further,  the  set  of  functions  with  m continuous  derivatives  on  ( — oo  , oo  ) is  a 
subspace  of  F ( — oo  , oo  ) as  is  the  set  of  functions  with  derivatives  of  all  orders  on 
( — oo  , oo  ) . We  will  denote  these  subspaces  by  Cm  ( — oo  , oo)  and  C “ ( — oo  , oo  ) , 
respectively. 


EXAMPLE  9 The  Subspace  of  All  Polynomials 

Recall  that  a polynomial  is  a function  that  can  be  expressed  in  the  form 

p(x)  =<2o  + a\x  + • • • +a„xn  (1) 

where  aQ,a\,  ■ • • , an  arc  constants.  It  is  evident  that  the  sum  of  two  polynomials  is  a 
polynomial  and  that  a constant  times  a polynomial  is  a polynomial.  Thus,  the  set  W of  all 
polynomials  is  closed  under  addition  and  scalar  multiplication  and  hence  is  a subspace  of 
F ( — oo  , oo  ) . We  will  denote  this  space  by  P^. 


EXAMPLE  10  The  Subspace  of  Polynomials  of  Degree  < n 

Recall  that  the  degree  of  a polynomial  is  the  highest  power  of  the  variable  that  occurs  with  a 
nonzero  coefficient.  Thus,  for  example,  if  an^0  in  Formula  1,  then  that  polynomial  has  degree  n. 
It  is  not  true  that  the  set  W of  polynomials  with  positive  degree  n is  a subspace  of  F ( — oo  , oo  ) 
because  that  set  is  not  closed  under  addition.  For  example,  the  polynomials 

1 + 2x  4-  3x 2 and  5 + 7x  — 3x 2 

both  have  degree  2,  but  their  sum  has  degree  1 . What  is  true,  however,  is  that  for  each  nonnegative 
integer  n the  polynomials  of  degree  n or  less  form  a subspace  of  F ( — oo  , oo  ) . We  will  denote 
this  space  by  Pn. 


In  this  text  we  regard  all  constants  to  be 
polynomials  of  degree  zero.  Be  aware,  however, 
that  some  authors  do  not  assign  a degree  to  the 
constant  0. 


The  Hierarchy  of  Function  Spaces 

It  is  proved  in  calculus  that  polynomials  are  continuous  functions  and  have  continuous  derivatives  of  all  orders 
on  ( — oo  , og  ) . Thus,  it  follows  that  is  not  only  a subspace  of  F ( — oo  , oo  ) , as  previously  observed,  but 
is  also  a subspace  of  C v ( — oo  , oo  ) . We  leave  it  for  you  to  convince  yourself  that  the  vector  spaces 
discussed  in  Example  7 to  Example  10  are  “nested”  one  inside  the  other  as  illustrated  in  Figure  4.2.5. 


C~(  oo) 
Cm(-oo,  ©a) 
Cl(-©o,  oo) 

C(-*\  «) 

/,*(~004  m) 


Figure  4.2.5 


In  our  previous  examples,  and  as  illustrated  in  Figure  4.2.5,  we  have  only  considered  functions  that 
are  defined  at  all  points  of  the  interval  ( — oo  , oo  ) . Sometimes  we  will  want  to  consider  functions  that  are 
only  defined  on  some  subinterval  of  ( — oo  , oo  ),  say  the  closed  interval  [a,  b]  or  the  open  interval  (a,  b ).  In 
such  cases  we  will  make  an  appropriate  notation  change.  For  example,  C[a,  b\  is  the  space  of  continuous 
functions  on  [a,  b\  and  C(a,  b ) is  the  space  of  continuous  functions  on  (a,  b). 


Building  Subspaces 

The  following  theorem  provides  a useful  way  of  creating  a new  subspace  from  known  subspaces. 


THEOREM  4.2.2 

lfW\,  Wj, ....  Wr  are  subspaces  of  a vector  space  V,  then  the  intersection  of  these  subspaces  is  also  a 
subspace  of  V. 


Note  that  the  first  step  in  proving  Theorem  4.2.2 
was  to  establish  that  W contained  at  least  one 
vector.  This  is  important,  for  otherwise  the 
subsequent  argument  might  be  logically  correct 
but  meaningless. 


Let  W be  the  intersection  of  the  subspaces  W\,  Wj, ...,  Wr.  This  set  is  not  empty  because  each  of  these 
subspaces  contains  the  zero  vector  of  V,  and  hence  so  does  their  intersection.  Thus,  it  remains  to  show  that  W is 
closed  under  addition  and  scalar  multiplication. 

To  prove  closure  under  addition,  let  u and  v be  vectors  in  W.  Since  W is  the  intersection  of  IY\ , W2, . . .,  Wr,  it 
follows  that  u and  v also  lie  in  each  of  these  subspaces.  Since  these  subspaces  are  all  closed  under  addition, 
they  all  contain  the  vector  u | v and  hence  so  does  their  intersection  W.  This  proves  that  W is  closed  under 
addition.  We  leave  the  proof  that  W is  closed  under  scalar  multiplication  to  you. 

Sometimes  we  will  want  to  find  the  “smallest”  subspace  of  a vector  space  V that  contains  all  of  the  vectors  in 
some  set  of  interest.  The  following  definition,  which  generalizes  Definition  4 of  Section  3.1,  will  help  us  to  do 
that. 


If  £•  = 1 , then  Equation  2 has  the  form 
w = &ivi,  in  which  case  the  linear  combination 
is  just  a scalar  multiple  of  vj . 


DEFINITION  2 

If  w is  a vector  in  a vector  space  V,  then  w is  said  to  be  a linear  combination  of  the  vectors 
vj,  v2, ....  vr  in  Fif  w can  be  expressed  in  the  form 

w = *ivi  + *2v2  + ’ - • +£yv>  (2) 

where  k[,  kj,  arc  scalars.  These  scalars  are  called  the  coefficients  of  the  linear  combination. 


THEOREM  4.2.3 

If  S'  = (wi , w2, ....  wr}  is  a nonempty  set  of  vectors  in  a vector  space  F,  then: 

(a)  The  set  W of  all  possible  linear  combinations  of  the  vectors  in  S'  is  a subspace  of  V. 

(b)  The  set  W in  part  (a)  is  the  “smallest”  subspace  of  V that  contains  all  of  the  vectors  in  S in  the  sense 
that  any  other  subspace  that  contains  those  vectors  contains  W. 


Let  Wbe  the  set  of  all  possible  linear  combinations  of  the  vectors  in  S.  We  must  show  that  S is 
closed  under  addition  and  scalar  multiplication.  To  prove  closure  under  addition,  let 

u = c jwi  + C2w2  + * • • + crvrr  and  v = £pwi  + &2W2  + ■ ■ ■ + krwr 
be  two  vectors  in  S.  It  follows  that  their  sum  can  be  written  as 

u + v=  (ci  +jfci)wi  + (C2  + ^2)w2  + ’ ’ ’ + (c,.  + £r)wr 

which  is  a linear  combination  of  the  vectors  in  S.  Thus,  W is  closed  under  addition.  We  leave  it  for  you  to  prove 
that  W is  also  closed  under  scalar  multiplication  and  hence  is  a subspace  of  V. 

Proof  (b)  Let  W be  any  subspace  of  V that  contains  all  of  the  vectors  in  S.  Since  W is  closed  under  addition 
and  scalar  multiplication,  it  contains  all  linear  combinations  of  the  vectors  in  S and  hence  contains  W. 


The  following  definition  gives  some  important  notation  and  terminology  related  to  Theorem  4.2.3. 


DEFINITION  3 

The  subspace  of  a vector  space  V that  is  formed  from  all  possible  linear  combinations  of  the  vectors  in 
a nonempty  set  S is  called  the  span  of  S,  and  we  say  that  the  vectors  in  S span  that  subspace.  If 
S = {wj , W2, . . wr } , then  we  denote  the  span  of  S by 

span{wi,  W2, wr}  or  span  (S') 


EXAMPLE  11  The  Standard  Unit  Vectors  Span  Rn 

Recall  that  the  standard  unit  vectors  in  Rn  are 

ei  = (1.  0,  0, ....  0),  e2  = (0,  1.  0, ....  0) e„  = (0.  0,  0,  ...1) 

These  vectors  span  Rn  since  every  vector  v = (vi,  V2 vM)  in  Rn  can  be  expressed  as 

v = vlel  + v2e2  4-  • • • +v„e„ 

which  is  a linear  combination  of  ei,  e2, ...,  eM.  Thus,  for  example,  the  vectors 

i=  (1.0.0),  j=  (0,1.0),  k=  (0,0,1) 

span  R-'  since  every  vector  v = (a,  b,  c)  in  this  space  can  be  expressed  as 

v=  (a,  b,  c ) =<*(!,  0,  0)  +&(0,  1,  0)  +c(0,  0,  1)  =ai  + ij  + de 


EXAMPLE  12  A Geometric  View  of  Spanning  in  R2  and  R3 

(a)  if  v is  a nonzero  vector  in  r}  or  R that  has  its  initial  point  at  the  origin,  then  spanjv},  which 
is  the  set  of  all  scalar  multiples  of  v,  is  the  line  through  the  origin  determined  by  v.  You  should 
be  able  to  visualize  this  from  Figure  4.2.6a  by  observing  that  the  tip  of  the  vector  k\  can  be 
made  to  fall  at  any  point  on  the  line  by  choosing  the  value  of  k appropriately. 


George  William  Hill  (1838-1914) 

The  terms  linearly  independent  and  linearly  dependent  were 
introduced  by  Maxime  Bocher  (see  p.  7)  in  his  book  Introduction  to  Higher  Algebra, 
published  in  1907.  The  term  linear  combination  is  due  to  the  American  mathematician 
G.  W.  Hill,  who  introduced  it  in  a research  paper  on  planetary  motion  published  in 
1900.  Hill  was  a “loner”  who  preferred  to  work  out  of  his  home  in  West  Nyack,  New 
York,  rather  than  in  academia,  though  he  did  try  lecturing  at  Columbia  University  for  a 
few  years.  Interestingly,  he  apparently  returned  the  teaching  salary,  indicating  that  he 
did  not  need  the  money  and  did  not  want  to  be  bothered  looking  after  it.  Although 
technically  a mathematician,  Hill  had  little  interest  in  modern  developments  of 
mathematics  and  worked  almost  entirely  on  the  theory  of  planetary  orbits. 

[Image:  Courtesy  of  the  American  Mathematical  Society] 


If  vj  and  V2  are  nonzero  vectors  in  p-'  that  have  their  initial  points  at  the  origin,  then 
span  (vi,  V2)  , which  consists  of  all  linear  combinations  of  vj  and  V2,  is  the  plane  through  the 
origin  determined  by  these  two  vectors.  You  should  be  able  to  visualize  this  from  Figure  4.2.66 
by  observing  that  the  tip  of  the  vector  ^ivi  + kyvj  can  be  made  to  fall  at  any  point  in  the 
plane  by  adjusting  the  scalars  k\  and  kt2  to  lengthen,  shorten,  or  reverse  the  directions  of  the 
vectors  and  ^2V2  appropriately. 


(a)  Span  |v J is  the  line  through  the  ( b ) Span  [v,.  v,)  is  the  plane  through  the 

origin  determined  by  v.  origin  determined  by  v,  and  v>. 


Figure  4.2.6 


EXAMPLE  13  A Spanning  Set  for  Pn 

The  polynomials  \,x,x^ xn  span  the  vector  space  Pn  defined  in  Example  10  since  each 

polynomial  p in  Pn  can  be  written  as 

p=tf0  + <zix+  • • - + <***” 

which  is  a linear  combination  of  1,  x,  x2,  • • • , xn.  We  can  denote  this  by  writing 

P„  = span\\,x,x2,  • • 


The  next  two  examples  are  concerned  with  two  important  types  of  problems: 

Given  a set  S of  vectors  in  R”  and  a vector  v in  Rn,  determine  whether  v is  a linear  combination  of  the 
vectors  in  S. 

Given  a set  S of  vectors  in  R”,  determine  whether  the  vectors  span  Rn. 

EXAMPLE  14  Linear  Combinations 

Consider  the  vectors  u = (1,  2,  — 1)  and  v = (6,  4,  2)  in  R*.  Show  that  w=  (9,  2,  7)  is  a 
linear  combination  of  u and  v and  that  wr  = (4,  — 1 , 8)  is  not  a linear  combination  of  u and  v. 

In  order  for  w to  be  a linear  combination  of  u and  v,  there  must  be  scalars  *i  and  *2 
such  that  w = *iu  + *2v;  that  is, 

(9,2,7)=*i(l,2,  -l)+*2(6,4.2) 

or 

(9,  2, 7)  = + 6*2,  2*i  4-  4*2,  - *1  + 2*2) 

Equating  corresponding  components  gives 

*1  + 6*2  = 9 
2*i+4*2  = 2 
— *1  + 2*2  = 7 

Solving  this  system  using  Gaussian  elimination  yields  k\  = — 3,  *2  = 2,  so 

w — — 3u  + 2v 

Similarly,  for  w'  to  be  a linear  combination  of  u and  v,  there  must  be  scalars  k\  and  *2  such  that 
v / = *iu  4-  *2v;  that  is, 

(4,  — 1,  8)  =*i(l,  2,  — /)  4-*2(6,  4,  2) 
or 

(4,  -1,8)  = (*1  4-  6*2,  2*i  4*4*2,  -*1  4-  2*2) 


Equating  corresponding  components  gives 

*1  + 6*2  = 4 

2k\  + 4*2  = 

— *1  + 2*2  = 8 

This  system  of  equations  is  inconsistent  (verify),  so  no  such  scalars  *i  and  *2  exist. 
Consequently,  w'  is  not  a linear  combination  of  u and  v. 


EXAMPLE  15  Testing  for  Spanning 


Determine  whether  vj  = (1,  1,  2),  V2  = (1,  0,  1),  and  V3  = (2,  1,  3)  span  the  vector  space  p}. 


We  must  determine  whether  an  arbitrary  vector  b = (b\,  *2,  £3)  m R~'  can  be 
expressed  as  a linear  combination 

b = *ivi  + *2v2  + ^3V3 

of  the  vectors  vi,  V2,  and  V3.  Expressing  this  equation  in  terms  of  components  gives 
(bu  b2,  h)  = *i(l,  1,  2)  +*2(1.  0,  1)  +*3(2, 1.  3) 
or 

(bi,b2,bj)  = (*1  +*2  + 2*3,  *1  + *3,  2*i  +*2  + 3*3) 


*1  +*2  + 2*3  = b\ 

*1  + *3  = *2 

2*i +*2 + 3*3  = 63 


Thus,  our  problem  reduces  to  ascertaining  whether  this  system  is  consistent  for  all  values  of  b\, 
b2,  and  b2.  One  way  of  doing  this  is  to  use  parts  (e)  and  (g)  of  Theorem  2.3.8,  which  state  that 
the  system  is  consistent  if  and  only  if  its  coefficient  matrix 


A = 


1 1 
1 0 
2 1 


2 

1 

3 


has  a nonzero  determinant.  But  this  is  not  the  case  here;  we  leave  it  for  you  to  confirm  that 
det(j4)  = 0,  so  vi,  V2,  and  V3  do  not  span 


Solution  Spaces  of  Homogeneous  Systems 


The  solutions  of  a homogeneous  linear  system  Ax.  = 0 of  m equations  in  n unknowns  can  be  viewed  as  vectors 
in  Rn.  The  following  theorem  provides  a useful  insight  into  the  geometric  structure  of  the  solution  set. 


THEOREM  4.2.4 


The  solution  set  of  a homogeneous  linear  system  ^ = 0 m n unknowns  is  a sub  space  of/?”. 


Let  W be  the  solution  set  for  the  system.  The  set  W is  not  empty  because  it  contains  at  least  the  trivial 
solution  x = 0- 

To  show  that  W is  a subspace  of/?”,  w e must  show  that  it  is  closed  under  addition  and  scalar  multiplication.  To 
do  this,  let  xj  and  *2  be  vectors  in  W.  Since  these  vectors  are  solutions  of  = 0,  we  have 

Ax\  = 0 and  Ax2  = 0 

It  follows  from  these  equations  and  the  distributive  property  of  matrix  multiplication  that 

+(xi  + X2)  = Ax\  4 Ax  2 = 0 + 0 = 0 
so  W is  closed  under  addition.  Similarly,  if  k is  any  scalar  then 

j4(£xi)  =£j4xi  = .t0  = 0 
so  W is  also  closed  under  scalar  multiplication. 

Because  the  solution  set  of  a homogeneous 
system  in  n unknowns  is  actually  a subspace  of 
Rn,  we  will  generally  refer  to  it  as  the  solution 
space  of  the  system. 


EXAMPLE  1 6 Solution  Spaces  of  Homogeneous  Systems 


Consider  the  linear  systems 


(a) 

1 -2  3' 

~x~ 

"0" 

2-4  6 

y 

— 

0 

3-6  9 

z 

0 

(b) 

1 

-2 

3' 

f:X~ 

'o' 

-3 

7 

-8 

y 

= 

0 

-2 

4 

-6 

z 

0 

(c) 

1 

-2 

3 

" x~ 

"0" 

-3 

7 

-8 

y 

= 

0 

4 

1 

2 

z 

0 

(d) 

o 

o 

0 

1  

” x~ 

"0" 

0 0 0 

y 

— 

0 

0 0 0 

z 

0 

Solution 

We  leave  it  for  you  to  verify  that  the  solutions  are 

x = 2s  — 3t,  y = s,  z = t 


from  which  it  follows  that 


x = 2y  — 3z  or  x — 2y  4-  3z  = 0 

This  is  the  equation  of  a plane  through  the  origin  that  has  n = ( 1 , — 2,  3)  as  a normal. 

We  leave  it  for  you  to  verify  that  the  solutions  are 

x=  -5 1,  y = -t,  z = t 

which  are  parametric  equations  for  the  line  through  the  origin  that  is  parallel  to  the  vector 

v=  (—5,  -1,1). 

We  leave  it  for  you  to  verify  that  the  only  solution  is  x = 0,  = 0,  z = 0>  so  the  solution 

space  is  {0}. 

This  linear  system  is  satisfied  by  all  real  values  of  x,  y,  and  z,  so  the  solution  space  is  all  of  R-‘ 


Whereas  the  solution  set  of  every  homogeneous  system  of  m equations  in  n unknowns  is  a subspace 
of  Rn,  it  is  never  true  that  the  solution  set  of  a nonhomogeneous  system  of  m equations  in  n unknowns  is  a 
subspace  of  Rn.  There  are  two  possible  scenarios:  first,  the  system  may  not  have  any  solutions  at  all,  and 
second,  if  there  are  solutions,  then  the  solution  set  will  not  be  closed  under  either  addition  or  under  scalar 
multiplication  (Exercise  18). 


A Concluding  Observation 

It  is  important  to  recognize  that  spanning  sets  are  not  unique.  For  example,  any  nonzero  vector  on  the  line  in 
Figure  4.2.6a  will  span  that  line,  and  any  two  noncollinear  vectors  in  the  plane  in  Figure  4.2.66  will  span  that 
plane.  The  following  theorem,  whose  proof  we  leave  as  an  exercise,  states  conditions  under  which  two  sets  of 
vectors  will  span  the  same  space. 


THEOREM  4.2.5 

If  S'  = {vj,  V2 vr)  and  S'  = (wj,  w2,  are  nonempty  sets  of  vectors  in  a vector  space  V, 

then 

span  { v i , v2 vr ) = span  <wi , w2, . . w* } 

if  and  only  if  each  vector  in  S is  a linear  combination  of  those  in  S',  and  each  vector  in  S'  is  a linear 
combination  of  those  in  S. 


Concept  Review 

Subspace 


Zero  subspace 
Examples  of  subspaces 
Linear  combination 
Span 

Solution  space 

Skills 

Determine  whether  a subset  of  a vector  space  is  a subspace. 

Show  that  a subset  of  a vector  space  is  a subspace. 

Show  that  a nonempty  subset  of  a vector  space  is  not  a subspace  by  demonstrating  that  the  set  is 
either  not  closed  under  addition  or  not  closed  under  scalar  multiplication. 

Given  a set  S of  vectors  in  Rn  and  a vector  v in  Rn,  determine  whether  v is  a linear  combination  of 
the  vectors  in  S. 

Given  a set  S of  vectors  in  R”,  determine  whether  the  vectors  in  S span  Rn. 

Determine  whether  two  nonempty  sets  of  vectors  in  a vector  space  V span  the  same  subspace  of  V. 


Exercise  Set  4.2 

1.  Use  Theorem  4.2.1  to  determine  which  of  the  following  are  subspaces  of  R-'. 

(a)  All  vectors  of  the  form  (a,  0,  0). 

(b)  All  vectors  of  the  form  (a,  1,  1). 

(c)  All  vectors  of  the  form  (a,  b,  c),  where  b = a + c- 

(d)  All  vectors  of  the  form  {a,  b,  c),  where  b = a + c 4-  1 • 

(e)  All  vectors  of  the  form  {a,  b,  0). 

Answer: 

(a),  (c),  (e) 

2.  Use  Theorem  4.2.1  to  determine  which  of  the  following  are  subspaces  of  Mnn. 

(a)  The  set  of  all  diagonal  nxn  matrices. 

(b)  The  set  of  all  ^ x n matrices  A such  that  det(-<4)  = 0. 

(c)  The  set  of  all  n x n matrices  A such  that  tr(A)  = 0. 

(d)  The  set  of  all  symmetric  nxn  matrices. 

(e)  The  set  of  all  ^ x n matrices  A such  that  = — A- 

(f)  The  set  of  all  n x n matrices  A for  which  Ax.  = 0 has  only  the  trivial  solution. 

(g)  The  set  of  all  n x n matrices  A such  that  AB  = BA  f°r  some  fixed  nxn  matrix  B. 

3.  Use  Theorem  4.2.1  to  determine  which  of  the  following  are  subspaces  of  P3. 

(a)  All  polynomials  aQ  + a{X  4,  a2X2  + ay?  for  which  a0  = 0. 


(b)  All  polynomials  aQ  +a\x  + ay?  4-  ay?  for  which  a0  + a\  + <*2  + = 0- 

(c)  All  polynomials  of  the  form  ag  | a^x  \ a ->x  2 | ay?'  *n  wbich  £0,  a\,  a2,  and  3 3 arc  integers. 

(d)  All  polynomials  of  the  form  + a\x,  where  «Q  and  a\  are  real  numbers. 

Answer: 

(a),(b),  (d) 

4.  Which  of  the  following  are  subspaces  of  F ( — 00  , 00)? 

(a)  All  functions /in  F(  — 00  , 00  ) for  which  / (0)  = 0. 

(b)  All  functions /in  F(  — 00  , 00  ) for  which  / (0)  = 1 . 

(c)  All  functions  finF(—  00  , 00  ) for  which/ (—  x)  = / (x). 

(d)  All  polynomials  of  degree  2. 

5.  Which  of  the  following  are  subspaces  of  R ^ ? 

(a)  All  sequences  v in  R ' of  the  form  v = (v,  0,  v,  0,  v,  0, ...). 

(b)  All  sequences  v in  R ' of  the  form  v = (v,  1,  v,  1,  v,  1, ...). 

(c)  All  sequences  v in  R ' of  the  form  v = (v,  2v,  4v,  8v,  16v, ...)  . 

(d)  All  sequences  in  R ' whose  components  are  0 from  some  point  on. 

Answer: 

(a),  (c),  (d) 

6.  A line  L through  the  origin  in  /?-'  can  be  represented  by  parametric  equations  of  the  form  x = at-  y = bt, 
and  z = ct-  Use  these  equations  to  show  that L is  a subspace  of//  by  showing  that  if vj  = (x\,  y\,z\)  and 
V2  = (*2, 72>  z2]  are  P°ints  on  and  k is  any  real  number,  then  kv\  and  vi  + V2  are  also  points  on  L. 

1.  Which  of  the  following  are  linear  combinations  of  u = (0,  — 2,  2)  and  v=(l,3,  — 1)? 

(a)  (2,2,2) 

(b)  (3,1,5) 

(c)  (0,4,5) 

(d)  (0,  0,  0) 

Answer: 

(a),  (b),  (d) 

8.  Express  the  following  as  linear  combinations  of  u = (2,  1,  4),  v = (1,  —1,3),  and  w=  (3,  2,  5). 

(a)  (-9,  -7,  -15) 

(b)  (6,11,6) 

(c)  (0,0,0) 

(d)  (7,8,9) 


9.  Which  of  the  following  are  linear  combinations  of 


? 


(a) 

6 

-8 

-1 

-8 

(b) 

'0  O' 

.0  0. 

(c) 

"6  0" 

_3  8_ 

(d) 

'-1 

5' 

7 

1 

c= 


2 

4 


Answer: 

(a),(b),  (c) 

10.  In  each  part  express  the  vector  as  a linear  combination  of  p j = 2 | x | 4x2,  p2  = 1 — x | 3x2,  and 
P3  = 3 + 2.x  + 5xz- 

(a)  — 9 — 7x  — 15x2 

(b)  6 + 1 lx  4=  6xA 

(c)  0 

(d)  7 + 8x  + 9x2 

11.  In  each  part,  determine  whether  the  given  vectors  span  R-‘. 

(a)  V1  = (2,  2,  2),  v2  = (0,  0,  3),  v3  = (0.  1,  1) 

(b)  vj  = (2,  — 1,  3),  v2  = (4, 1,  2),  v3  = (8,  -1.8) 

(c)  vj  = (3,  1, 4),  v2  = (2,  — 3,  5),  v3  = (5,  -2.  9),  v4=  (1.4.  -1) 

(d)  vi  = (1,  2,  6),  v2  = (3, 4,  1),  v3  = (4,  3,  1),  v4=  (3,  3,  1) 

Answer: 

(a)  The  vectors  span 

(b)  The  vectors  do  not  span 

(c)  The  vectors  do  not  span 

(d)  The  vectors  span 

12.  Suppose  that  vj  = (2,  1,  0,  3),  v2  = (3,  — 1,  5,  2),  and  v3  = ( — 1,  0,  2,  1).  Which  of  the  following 
vectors  are  in  span  (vi,  v2,  v3}  ? 

(a)  (2,3,  -7,3) 

(b)  (0,  0,  0,  0) 

(c)  (1,1, 1,  1) 

(d)  (-4,6,  -13,4) 


13.  Determine  whether  the  following  polynomials  span  P2- 


Answer: 


PI  = 1 -x  + 2x2,  P2  = 3 + x, 

P3  = 5 — x + Ax2,  P4  = — 2 — 2x  4-  2x 2 


The  polynomials  do  not  span 

14.  Let  f = Cos^x  and  g = sin"x.  Which  of  the  following  lie  in  the  space  spanned  by  f and  g? 

(a)  cos  2x 

(b)  3+x2 

(c)  1 

(d)  sinx 

(e)  0 

15.  Determine  whether  the  solution  space  of  the  system  Ax  = 0 is  a line  through  the  origin,  a plane  through  the 
origin,  or  the  origin  only.  If  it  is  a plane,  find  an  equation  for  it.  If  it  is  a line,  find  parametric  equations  for 
it. 

(a)  r-i  i r 

A=  3-1  0 

2 -4  -5 

(b)  1 -2  3 

A=  -3  6 9 

-2  4 -6 

(c)  12  3' 

A=  2 5 3 

1 0 8 

(d)  [12  -6" 

A=  14  4 

3 10  6 

(e)  [1-1  1 = 

A=  2-1  4 

3 1 11 

(f)  [1  -3  1] 

4=  2 -6  2 

3-9  3 


Answer: 


(a)  Line;  x = - y = - -| t,  z = t 

(b)  Line;  x = 2t,  y =t,  z = 0 

(c)  Origin 

(d)  Origin 


(e)  Line;  x — —3 1,  y = —2 t,  z — t 

(f)  Plane;  x — 3y  4-  z = 0 

16.  ( Calculus  required)  Show  that  the  following  sets  of  functions  are  subspaces  of  F(  — oo,  oo). 

(a)  All  continuous  functions  on  ( — oo,  oo) . 

(b)  All  differentiable  functions  on  ( — oo,  oo) . 

(c)  All  differentiable  functions  on  ( — oo,  oo)  that  satisfy  f ' 4-  2f  = 0. 

17.  ( Calculus  required)  Show  that  the  set  of  continuous  functions  f = / (A)  on  [a,  b\  such  that 

r4b=o 

is  a subspace  of  C[a,  h\. 

18.  Show  that  the  solution  vectors  of  a consistent  nonhomoge-  neous  system  of  m linear  equations  in  n 
unknowns  do  not  form  a subspace  of  Rn. 

19.  Prove  Theorem  4.2.5. 

20.  Use  Theorem  4.2.5  to  show  that  the  vectors  vi  = (1,  6,  4),  V2  = (2,  4,  — 1),  V3  = ( — 1,  2,  5),  and  the 
vectors  wq  = (1,  — 2,  — 5),  W2  = (0,  8,  9)  span  the  same  subspace  of  r}. 

True-False  Exercises 

In  parts  (a)-(k)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  Every  subspace  of  a vector  space  is  itself  a vector  space. 

Answer: 

True 

(b)  Every  vector  space  is  a subspace  of  itself. 

Answer: 

True 

(c)  Every  subset  of  a vector  space  V that  contains  the  zero  vector  in  V is  a subspace  of  V. 

Answer: 

False 

(d)  The  set  r}  is  a subspace  of  r}. 

Answer: 

False 

(e)  The  solution  set  of  a consistent  linear  system  Ax  = b m equations  in  n unknowns  is  a subspace  of/?”. 


Answer: 


False 

(f)  The  span  of  any  finite  set  of  vectors  in  a vector  space  is  closed  under  addition  and  scalar  multiplication. 
Answer: 

True 

(g)  The  intersection  of  any  two  subspaces  of  a vector  space  V is  a subspace  of  V. 

Answer: 

True 

(h)  The  union  of  any  two  subspaces  of  a vector  space  V is  a subspace  of  V. 

Answer: 

False 

(i)  Two  subsets  of  a vector  space  V that  span  the  same  subspace  of  V must  be  equal. 

Answer: 

False 

(j)  The  set  of  upper  triangular  nxn  matrices  is  a subspace  of  the  vector  space  of  all  ^ x n matrices. 
Answer: 

True 

0s)  The  polynomials  % — 1,  (*  “ ^ J~,  and  (x  — 1 j"'  span  P3. 

Answer: 

False 
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4.3  Linear  Independence 

In  this  section  we  will  consider  the  question  of  whether  the  vectors  in  a given  set  are  interrelated  in  the  sense 
that  one  or  more  of  them  can  be  expressed  as  a linear  combination  of  the  others.  This  is  important  to  know  in 
applications  because  the  existence  of  such  relationships  often  signals  that  some  kind  of  complication  is  likely 
to  occur. 


Extraneous  Vectors 


In  a rectangular  xy-coordinate  system  every  vector  in  the  plane  can  be  expressed  in  exactly  one  way  as  a 
linear  combination  of  the  standard  unit  vectors.  For  example,  the  only  way  to  express  the  vector  (3,  2)  as  a 
linear  combination  of  i = (1,  0)  and  j = (0,  1)  is 

(3,  2)  = 3(1,  0)  4-  2(0,  1)  = 3i  + 2j  (1) 


(Figure  4.3.1).  Suppose,  however,  that  we  were  to  introduce  a third  coordinate  axis  that  makes  an  angle  of  45° 
with  the  x-axis.  Call  it  the  w-axis.  As  illustrated  in  Figure  4.3.2,  the  unit  vector  along  the  w-axis  is 


Whereas  Formula  1 shows  the  only  way  to  express  the  vector  (3,  2)  as  a linear  combination  of  i and  j,  there 
are  infinitely  many  ways  to  express  this  vector  as  a linear  combination  of  i,  j,  and  w.  Three  possibilities  are 


+ 2 0,1 


= 3i  + 2j  + Ow 


3,2  =2  1.0  + 0,1 


4= 


3,2  =4  1.0 


+ 3 0,  1 


= 3i  + j + ^2w 


= 4i  + 3j  — ^2w 


In  short,  by  introducing  a superfluous  axis  we  created  the  complication  of  having  multiple  ways  of  assigning 
coordinates  to  points  in  the  plane.  What  makes  the  vector  w superfluous  is  the  fact  that  it  can  be  expressed  as 
a linear  combination  of  the  vectors  i and  j,  namely, 


w — 


= -M  + 


/T  & 


Thus,  one  of  our  main  tasks  in  this  section  will  be  to  develop  ways  of  ascertaining  whether  one  vector  in  a set 
S is  a linear  combination  of  other  vectors  in  S. 


Figure  4.3.1 


Linear  Independence  and  Dependence 


We  will  often  apply  the  terms  linearly 
independent  and  linearly  dependent  to  the 
vectors  themselves  rather  than  to  the  set. 


DEFINITION  1 

If  S'  = { vi , V2, . . .,  vr } is  a nonempty  set  of  vectors  in  a vector  space  V.  then  the  vector  equation 

*ivi  + ^2v2  + -•-  + k?vr  = 0 

has  at  least  one  solution,  namely, 

Ari  = 0,  A:2  = 0, £r  = 0 

We  call  this  the  trivial  solution.  If  this  is  the  only  solution,  then  S is  said  to  be  a linearly  independent 
set.  If  there  are  solutions  in  addition  to  the  trivial  solution,  then  S is  said  to  be  a linearly  dependent 
set. 


EXAMPLE  1 Linear  Independence  of  the  Standard  Unit  Vectors  in  Rn 


The  most  basic  linearly  independent  set  in  R M is  the  set  of  standard  unit  vectors 

ei  = (1,  0,  0 0),  e2=  (0,1,0 0) e„  = (0,  0,  0 1) 


For  notational  simplicity,  we  will  prove  the  linear  independence  in  R-'  of 

i=  (1,0.0).  j=  (0,1,0),  k=  (0,0,1) 

The  linear  independence  or  linear  dependence  of  these  vectors  is  determined  by  whether  there  exist  non 
solutions  of  the  vector  equation 


*li  + *2J  + *3k  = 0 

Since  the  component  form  of  this  equation  is 

(*1,*2.  *3)  = (0,  0,  0) 

it  follows  that  *i  = *2  = £3  = 0.  This  implies  that  2 has  only  the  trivial  solution  and  hence  that  the  vec 
linearly  independent. 


EXAMPLE  2 Linear  Independence  in  f?3 

Determine  whether  the  vectors 

VI  = (1.  -2,3),  y2  = (5,6.  -1),  v3  =(3,2,1) 

are  linearly  independent  or  linearly  dependent  in 

The  linear  independence  or  linear  dependence  of  these  vectors  is  determined  by 
whether  there  exist  nontrivial  solutions  of  the  vector  equation 

*ivi  + &2V2  + kyv2  = 0 (3) 


or,  equivalently,  of 


*1(1.  -2,  3) +*2(5.  6.  -1) +*3(3,  2,1)  = (0,0,0) 


Equating  corresponding  components  on  the  two  sides  yields  the  homogeneous  linear  system 

*1  + 5*2  + 3*3  = 0 

— 2*i  + 6*2  + 2*3  = 0 (4) 

3*i  — *2  + *3  = 0 


Thus,  our  problem  reduces  to  determining  whether  this  system  has  nontrivial  solutions.  There 
are  various  ways  to  do  this;  one  possibility  is  to  simply  solve  the  system,  which  yields 

*1  = - *2  = “ *3  = t 


(we  omit  the  details).  This  shows  that  the  system  has  nontrivial  solutions  and  hence  that  the 
vectors  are  linearly  dependent.  A second  method  for  obtaining  the  same  result  is  to  compute  the 
determinant  of  the  coefficient  matrix 


A = 


1 5 3 

-2  6 2 

3 -1  1 


and  use  parts  ( b ) and  (g)  of  Theorem  2.3.8.  We  leave  it  for  you  to  verify  that  det(2l)  = 0,  from 
which  it  follows  3 has  nontrivial  solutions  and  the  vectors  are  linearly  dependent. 


In  Example  2,  what  relationship  do  you  see 
between  the  components  of  V2,  and  V3  and 
the  columns  of  the  coefficient  matrix  A? 


EXAMPLE  3 Linear  Independence  in  f?4 

Determine  whether  the  vectors 

V!  = (l,2,2,  -1),  v2  = (4,  9,  9,  —4),  v3  = (5,  8,  9,  - 5) 
in  are  linearly  dependent  or  linearly  independent. 


The  linear  independence  or  linear  dependence  of  these  vectors  is  determined  by 
whether  there  exist  nontrivial  solutions  of  the  vector  equation 

*1  vj  4-  *2v2  + *3v3  = 0 


or,  equivalently,  of 

*i(l,2.  2.  -1)  +*2(4,9,  9,  —4)  + *3(5,  8,  9,  - 5)  = (0,  0,  0,  0) 


Equating  corresponding  components  on  the  two  sides  yields  the  homogeneous  linear  system 

*1  + 4*2  + 5*3  = 0 
2k\  + 9 *2  + 8*3  = 0 
2k\  + 9 *2  + 9*3  = 0 
— *i  — 4*2  — 5*3  = 0 

We  leave  it  for  you  to  show  that  this  system  has  only  the  trivial  solution 

*1  =0,  *2  = 0,  *3  = 0 

from  which  you  can  conclude  that  vj,  v2,  and  v3  are  linearly  independent. 


EXAMPLE  4 An  Important  Linearly  Independent  Set  in  Pn 

Show  that  the  polynomials 

1,  x,  x2,...,  xn 

form  a linearly  independent  set  in  Pn. 

For  convenience,  let  us  denote  the  polynomials  as 

P0=  L Pi=x,  p2=x2,...,  p„  = xn 
We  must  show  that  the  vector  equation 

£?0P0  + ^1P1 +«2P2+  • - • +a«PM  = 0 


has  only  the  trivial  solution 


(6) 


aQ  = ai=a2=  * • • =an  = 0 


But  5 is  equivalent  to  the  statement  that 


for  all  x in  ( — oo,  oo) , so  we  must  show  that  this  holds  if  and  only  if  each  coefficient  in  6 is  zero. 
To  see  that  this  is  so,  recall  from  algebra  that  a nonzero  polynomial  of  degree  n has  at  most  n 
distinct  roots.  That  being  the  case,  each  coefficient  in  6 must  be  zero,  for  otherwise  the  left  side  of 
the  equation  would  be  a nonzero  polynomial  with  infinitely  many  roots.  Thus,  5 has  only  the 
trivial  solution. 


The  following  example  shows  that  the  problem  of  determining  whether  a given  set  of  vectors  in  Pn  is  linearly 
independent  or  linearly  dependent  can  be  reduced  to  determining  whether  a certain  set  of  vectors  in  Rn  is 
linearly  dependent  or  independent. 

EXAMPLE  5 Linear  Independence  of  Polynomials 

Determine  whether  the  polynomials 


Pl  = 1 — x,  p2  = 5 + 3x  — 2x^,  P3  = 1 + 3x  — x2 


are  linearly  dependent  or  linearly  independent  in  Pj. 


The  linear  independence  or  linear  dependence  of  these  vectors  is  determined  by 
whether  there  exist  nontrivial  solutions  of  the  vector  equation 


*lPl  + *2P2  + ^3P3  = 0 


(7) 


This  equation  can  be  written  as 


or,  equivalently,  as 


|*1  + 5*2  4-  *3  J + ( — *i  4-  3*2  4-  3*3  Jx  + ^ — 2*2  — *3  Jx2  = 0 


Since  this  equation  must  be  satisfied  by  all  x in  ( — 00,  00),  each  coefficient  must  be  zero  (as 
explained  in  the  previous  example).  Thus,  the  linear  dependence  or  independence  of  the  given 
polynomials  hinges  on  whether  the  following  linear  system  has  a nontrivial  solution: 


*1  + 5*2  + *3  = 0 

— *1  + 3*2  + 3*3  = 0 
—2*2  — *3  = 0 


(9) 


We  leave  it  for  you  to  show  that  this  linear  system  has  a nontrivial  solutions  either  by  solving  it 
directly  or  by  showing  that  the  coefficient  matrix  has  determinant  zero.  Thus,  the  set 
{p  1 , P2,  P3)  is  linearly  dependent. 


In  Example  5,  what  relationship  do  you  see 
between  the  coefficients  of  the  given 
polynomials  and  the  column  vectors  of  the 
coefficient  matrix  of  system  9? 


An  Alternative  Interpretation  of  Linear  Independence 

The  terms  linearly  dependent  and  linearly  independent  are  intended  to  indicate  whether  the  vectors  in  a given 
set  are  interrelated  in  some  way.  The  following  theorem,  whose  proof  is  deferred  to  the  end  of  this  section, 
makes  this  idea  more  precise. 


THEOREM  4.3.1 

A set  S with  two  or  more  vectors  is 

(a)  Linearly  dependent  if  and  only  if  at  least  one  of  the  vectors  in  S is  expressible  as  a linear 
combination  of  the  other  vectors  in  S. 

(b)  Linearly  independent  if  and  only  if  no  vector  in  S is  expressible  as  a linear  combination  of  the 
other  vectors  in  S. 


EXAMPLE  6 Example  1 Revisited 


In  Example  1 we  showed  that  the  standard  unit  vectors  in  Rn  are  linearly  independent.  Thus,  it 
follows  from  Theorem  4.3.1  that  none  of  these  vectors  is  expressible  as  a linear  combination  of 
the  other  two.  To  illustrate  this  in  p,  suppose,  for  example,  that 

k = Aqi  4-  &2J 


or  in  terms  of  components  that 


(0,  0,1)  = (*1,*2.  0) 


Since  this  equation  cannot  be  satisfied  by  any  values  of  and  ^ there  is  no  way  to  express  k 
as  a linear  combination  of  i and  j.  Similarly,  i is  not  expressible  as  a linear  combination  of  j and 
k,  and  j is  not  expressible  as  a linear  combination  of  i and  k. 


EXAMPLE  7 Example  2 Revisited 

In  Example  2 we  saw  that  the  vectors 

vi  = (1.  -2.3).  v2  = (5,  6,  — 1),  v3  =(3,2,1) 

are  linearly  dependent.  Thus,  it  follows  from  Theorem  4.3.1  that  at  least  one  of  these  vectors  is 


expressible  as  a linear  combination  of  the  other  two.  We  leave  it  for  you  to  confirm  that  these 
vectors  satisfy  the  equation 

-1-vi  + ^V2  - V3  = 0 
from  which  it  follows,  for  example,  that 

v3  = ^vi  + ^v2 


Sets  with  One  or  Two  Vectors 

The  following  basic  theorem  is  concerned  with  the  linear  independence  and  linear  dependence  of  sets  with 
one  or  two  vectors  and  sets  that  contain  the  zero  vector. 


THEOREM  4.3.2 

(a)  A finite  set  that  contains  0 is  linearly  dependent. 

(b)  A set  with  exactly  one  vector  is  linearly  independent  if  and  only  if  that  vector  is  not  0. 

(c)  A set  with  exactly  two  vectors  is  linearly  independent  if  and  only  if  neither  vector  is  a scalar 
multiple  of  the  other. 


Jozef  Hoene  de  Wronski  (1778-1853) 

The  Polish-French  mathematician  Jozef  Hoene  de  Wronski  was  bom  Jozef  Hoene 
and  adopted  the  name  Wronski  after  he  married.  Wronski’s  life  was  fraught  with  controversy  and 
conflict,  which  some  say  was  due  to  his  psychopathic  tendencies  and  his  exaggeration  of  the 
importance  of  his  own  work.  Although  Wronski's  work  was  dismissed  as  mbbish  for  many  years,  and 
much  of  it  was  indeed  erroneous,  some  of  his  ideas  contained  hidden  brilliance  and  have  survived. 
Among  other  things,  Wronski  designed  a caterpillar  vehicle  to  compete  with  trains  (though  it  was 


never  manufactured)  and  did  research  on  the  famous  problem  of  determining  the  longitude  of  a ship  at 
sea.  His  final  years  were  spent  in  poverty. 

[Image:  wikipedia] 


We  will  prove  part  (a)  and  leave  the  rest  as  exercises. 

For  any  vectors  v\,  V2, vr,  the  set  S=  {vj,  V2, vr,  0)  is  linearly  dependent  since  the 

equation 


Ovj  + 0v2  + ■ ’ " + Ovy  =H  1 (0)  = 0 

expresses  0 as  a linear  combination  of  the  vectors  in  S with  coefficients  that  are  not  all  zero. 

EXAMPLE  8 Linear  Independence  of  Two  Functions 

The  functions  f j = x and  f 2 = sin  x are  linearly  independent  vectors  in  F(  — 00,  00)  since 
neither  function  is  a scalar  multiple  of  the  other.  On  the  other  hand,  the  two  functions 
gl  = sin  2x  and  g2  = sin  x cos  x are  linearly  dependent  because  the  trigonometric  identity 
sin  2x  = 2 sin  x cos  x reveals  that  gl  and  g2  are  scalar  multiples  of  each  other. 


A Geometric  Interpretation  of  Linear  Independence 

Linear  independence  has  the  following  useful  geometric  interpretations  in  g2  and 

Two  vectors  in  g}  or  g-'  are  linearly  independent  if  and  only  if  they  do  not  lie  on  the  same  line  when  they 
have  their  initial  points  at  the  origin.  Otherwise  one  would  be  a scalar  multiple  of  the  other  (Figure  4.3.3). 


Figure  4.3.3 

Three  vectors  in  are  linearly  independent  if  and  only  if  they  do  not  lie  in  the  same  plane  when  they  have 
their  initial  points  at  the  origin.  Otherwise  at  least  one  would  be  a linear  combination  of  the  other  two 
(Figure  4.3.4). 


(a)  Linearly  dependent  (6)  Linearly  dependent  (c)  Linearly  independent 

Figure  4.3.4 


At  the  beginning  of  this  section  we  observed  that  a third  coordinate  axis  in  Rz  is  superfluous  by  showing  that 
a unit  vector  along  such  an  axis  would  have  to  be  expressible  as  a linear  combination  of  unit  vectors  along  the 
positive  x-  and  y-axis.  That  result  is  a consequence  of  the  next  theorem,  which  shows  that  there  can  be  at  most 
n vectors  in  any  linearly  independent  set  RM. 

It  follows  from  Theorem  4.3.3,  for  example, 
that  a set  in  r}  with  more  than  two  vectors  is 
linearly  dependent  and  a set  in  r}  with  more 
than  three  vectors  is  linearly  dependent. 


THEOREM  4.3.3 

Let  S=  {vi,  V2. ....  vr)  be  a set  of  vectors  in  Rn.  If  r > «,  then  S is  linearly  dependent. 


Suppose  that 


VI  = 

(vll>  v12>  • 

• AVi„) 

v2  = 

(v21>  v22>  • 

' ’ , v2m) 

v,  = 

(vq,Vr2>  • 

• • . v,„) 

and  consider  the  equation 

/tjvi  + &2v2  + ‘ ' • + krvr  — 0 

If  we  express  both  sides  of  this  equation  in  terms  of  components  and  then  equate  the  corresponding 
components,  we  obtain  the  system 


Vll*l  +V21*2+  • • • +vri^r  = 0 

Vi2*l+V22*2+  ‘ ‘ ‘ +vr2^r  = 0 

viM*l  +v2m^2+  • • • + vmkr  = 0 

This  is  a homogeneous  system  of  n equations  in  the  r unknowns  fcj, ....  kr.  Since  r>n>  it  follows  from 
Theorem  1.2.2  that  the  system  has  nontrivial  solutions.  Therefore,  S = {vi,  v2, ...,  vr)  is  a linearly 
dependent  set. 

CALCULUS  REQUIRED 

Linear  Independence  of  Functions 

Sometimes  linear  dependence  of  functions  can  be  deduced  from  known  identities.  For  example,  the  functions 

f 1 = sin  x,  f2  = cos  x,  and  f 3 = 5 
form  a linearly  dependent  set  in  F ( — 00,  00) , since  the  equation 

5F 1 -H  5f  2 — f 3 = 5suAr  + 5cos^r  — 5 

= 5 ^sin^x  + cos^x  J — 5 = 0 

expresses  0 as  a linear  combination  of  f j , f 2,  and  f 3 with  coefficients  that  are  not  all  zero. 

Unfortunately,  there  is  no  general  method  that  can  be  used  to  determine  whether  a set  of  functions  is  linearly 
independent  or  linearly  dependent.  However,  there  does  exist  a theorem  that  is  useful  for  establishing  linear 
independence  in  certain  circumstances.  The  following  definition  will  be  useful  for  discussing  that  theorem. 


DEFINITION  2 


If  f j = J j (x),  f 2 = / 2(x), . fM  = f n(x)  are  functions  that  are  >1  — \ times  differentiable  on  the 
interval  ( — 00  , 00  ),  then  the  determinant 


/ lW  S 2 OO  • • • fnO 0 

/!(*)  A(x)  •••  /"(*) 


is  called  the  Wronskian  of  / f 2 > - f n 


Suppose  for  the  moment  that  f 1 = / \ = f 2ix)>  = f nix)  are  linearly  dependent  vectors  in 

C1 ' k 1 1 j — 00,  00  J.  This  implies  that  for  certain  values  of  the  coefficients  the  vector  equation 

Arif  1 +^2^2+  ’ " ’ + krfin  = 0 
has  a nontrivial  solution,  or  equivalently  that  the  equation 


*l/lOO+*2/2(*)  + • • • + *m/m(*)  = 0 

is  satisfied  for  all  x in  ( — oo,  oo) . Using  this  equation  together  with  those  that  result  by  differentiating  it 
n — 1 times  yields  the  linear  system 

*l/lOO  +*2/200  + • • • +knf„(x)  =0 

*l/[  (*)  +*2/2(x)  + • • • + *„/£(*)  =0 


Thus,  the  linear  dependence  of  f f 2, f M implies  that  the  linear  system 


/ 1O) 
/!« 

/ 2O) 

• • • /«w 

• • • /»(*) 

*1 

*2 

'o' 

0 

/r(* 

to 

1 

•— * 

• • • 

0 

has  a nontrivial  solution.  But  this  implies  that  the  determinant  of  the  coefficient  matrix  of  10  is  zero  for  every 
suchx.  Since  this  determinant  is  the  Wronskian  of  / 1,  f 2>  f w we  have  established  the  following  result. 


THEOREM  4.3.4 

If  the  functions  f 1,  f 2, f n have  n—\  continuous  derivatives  on  the  interval  ( — 00,  00),  and  if  the 
Wronskian  of  these  functions  is  not  identically  zero  on  ( — 00,  00) , then  these  functions  form  a 

linearly  independent  set  of  vectors  in  k 1 1 j — 00,  oo  }. 


In  Example  8 we  showed  that  x and  sin  x are  linearly  independent  functions  by  observing  that  neither  is  a 
scalar  multiple  of  the  other.  The  following  example  shows  how  to  obtain  the  same  result  using  the  Wronskian 
(though  it  is  a more  complicated  procedure  in  this  particular  case). 

EXAMPLE  9 Linear  Independence  Using  the  Wronskian 


Use  the  Wronskian  to  show  that  f 1 = x and  f 2 = sin  x are  linearly  independent. 


The  Wronskian  is 


W 


x 

1 


sinx 
cos  x 


= x cos  x — sin  x 


This  function  is  not  identically  zero  on  the  interval  ( — 00,  00)  since,  for  example, 


Thus,  the  functions  are  linearly  independent. 


WARNING 


The  converse  of  Theorem  4.3.4  is  false.  If  the 
Wronskian  of  f f 2, ...,  is  identically  zero 

on  ( — 00,  00) , then  no  conclusion  can  be 
reached  about  the  linear  independence  of 

<fl.f2 f„>  — this  set  of  vectors  may  be 

linearly  independent  or  linearly  dependent. 

EXAMPLE  10  Linear  Independence  Using  the  Wronskian 

Use  the  Wronskian  to  show  that  f j = 1,  f = ex,  and  f -;  = e^x  are  linearly  independent. 

The  Wronskian  is 


This  function  is  obviously  not  identically  zero  on  ( — 00,  00) , so  f j , f 3,  and  f 3 form  a linearly 
independent  set. 

OPTIONAL 

We  will  close  this  section  by  proving  part  (a)  of  Theorem  4.3.1.  We  will  leave  the  proof  of  part  ( b ) as  an 
exercise. 

of  Theorem  4.3. 1 (a)  Let  S = {vj , V2,  vr)  be  a set  with  two  or  more  vectors.  If  we  assume 
that  S is  linearly  dependent,  then  there  are  scalars  k\,  k2,  ■■■,  kr,  not  all  zero,  such  that 


which  expresses  vj  as  a linear  combination  of  the  other  vectors  in  S.  Similarly,  if  kj  0 in  1 1 for  some 
j = 2,3, r,  then  v;  is  expressible  as  a linear  combination  of  the  other  vectors  in  S. 

Conversely,  let  us  assume  that  at  least  one  of  the  vectors  in  S is  expressible  as  a linear  combination  of  the 
other  vectors.  To  be  specific,  suppose  that 


1 **  <?2* 
W(x)=  0 e*  2e2x 
0 e*  4s2* 


*1Y1  + &2v2  + ‘ ' • + krvr  = 0 


To  be  specific,  suppose  that  0.  Then  1 1 can  be  rewritten  as 


V1  =C2V2  + C3V3  + * • • +crvr 


V1  — C2V2  ~ C3V3  — • • • — crvr  = 0 
It  follows  that  S is  linearly  dependent  since  the  equation 

ijvi  + &2v2  + * - * + *rvr  = 0 


is  satisfied  by 


*1  = 1,  k2=  -c2 kr=-cr 

which  are  not  all  zero.  The  proof  in  the  case  where  some  vector  other  than  vi  is  expressible  as  a linear 
combination  of  the  other  vectors  in  S is  similar. 


Concept  Review 

Trivial  solution 
Linearly  independent  set 
Linearly  dependent  set 
Wronskian 

Skills 

Determine  whether  a set  of  vectors  is  linearly  independent  or  linearly  dependent. 

Express  one  vector  in  a linearly  dependent  set  as  a linear  combination  of  the  other  vectors  in  the  set. 
Use  the  Wronskian  to  show  that  a set  of  functions  is  linearly  independent. 


Exercise  Set  4.3 


1.  Explain  why  the  following  are  linearly  dependent  sets  of  vectors.  (Solve  this  problem  by  inspection.) 

(a)  ui  = ( — 1,  2,4)  and u2  = (5,  - 10,  -20)  in/?3 

(b)  ui  = (3,  - l),u2=  (4,  5),u3  = (-4,7)  in/?2 

(c)  P1  = 3 - 2x  + x2  and  P2  = 6 - 4x  + 2x 2 in  P2 


-3  4 
2 0 


and  B = 


3 -4 

-2  0 


in  M22 


Answer: 


(a)  u2  is  a scalar  multiple  of  uj . 

(b)  The  vectors  are  linearly  dependent  by  Theorem  4.3.3. 

(c)  P2  is  a scalar  multiple  of  p l . 

(d)  B is  a scalar  multiple  of  4. 

2.  Which  of  the  following  sets  of  vectors  in  /?-'  are  linearly  dependent? 

(a)  (4,  -1,2),  (-4,10,2) 


(b)  (-3,0,4),  (5,  -1,2),  (1,1,3) 

(c)  (8,  -1,3),  (4,0,1) 

(d)  (-2,0,1),  (3,2,5),  (6,  -1,1),  (7,0,  -2) 

3.  Which  of  the  following  sets  of  vectors  in  are  linearly  dependent? 

(a)  (3,  8, 7,  - 3),  (1,  5,  3,  - 1),  (2,  - 1,  2,  6),  (1, 4,  0,  3) 

(b)  (0,0,  2,  2),  (3,  3,  0,0),  (1,1,0,  -1) 

(c)  (0,3,  -3,  -6),  (-2,  0,0,  — 6),  (0,  -4,  -2,  -2),(0,  -8,4,  -4) 

(d)  (3,  0,  - 3,  6),  (0,  2,  3,  1),  (0,  - 2,  - 2,  0),  ( - 2,  1,  2,  1) 

Answer: 

None 

4.  Which  of  the  following  sets  of  vectors  in  P2  are  linearly  dependent? 

(a)  2-x  + 4x2,  3 + 6;r  + 2;r2,  2+ 10;r-4;r2 

(b)  3 + x + x2,2-x  + 5x2,  4 - 3x2 

(c)  6-x2 

(d)  1 4-  3x  + 3x2,  x + 4xA,  5 4-  6x  4-  3xz,  7 + 2x  — xA 

5.  Assume  that  vi,  v2,  and  V3  are  vectors  in  R-‘  that  have  their  initial  points  at  the  origin.  In  each  part, 
determine  whether  the  three  vectors  lie  in  a plane. 

(a)  v1  = (2,  — 2,  0),  v2  = (6, 1, 4),  v3  = (2,  0,  -4) 

(b)  vj  = ( — 6, 7,  2),  v2  = (3,  2, 4),  v3  = (4,  -1,2) 

Answer: 

(a)  They  do  not  lie  in  a plane. 

(b)  They  do  lie  in  a plane. 

6.  Assume  that  vj,  v2,  and  v3  are  vectors  in  p-‘  that  have  their  initial  points  at  the  origin.  In  each  part, 
determine  whether  the  three  vectors  lie  on  the  same  line. 

(a)  vi  = (- 1,  2,  3),v2=  (2,  -4,  - 6),  v3  = ( - 3,  6,  0) 

(b)  vi  = (2,  - 1, 4),  v2  = (4,  2,  3),  v3  = (2, 7,  - 6) 

(c)  vi  = (4,  6,  8),  v2  = (2,  3, 4),  v3  = (-2,  -3,  -4) 

'•  (a)  Show  that  the  three  vectors  vj  = (0,  3,  1,  — 1),  v2  = (6,  0,  5,  1),  and  v3  = (4,  — 7,  1,  3)  form  a 
linearly  dependent  set  in  £4. 

(b)  Express  each  vector  in  part  (a)  as  a linear  combination  of  the  other  two. 

Answer: 

o ^ n 'z  n o 

(b)  vi  = ^v2  - ^v3,  v2  = -£vi  + |v3,  v3  = - jV!  + jv2 


(a)  Show  that  the  three  vectors  vi  = (1,  2,  3,  4),  v2  = (0,  1,  0,  — 1),  and  V3  = (1,  3,  3,  3)  form  a 
linearly  dependent  set  in  £4. 

(b)  Express  each  vector  in  part  (a)  as  a linear  combination  of  the  other  two. 

9.  For  which  real  values  of  ,\  do  the  following  vectors  form  a linearly  dependent  set  in 


vi  = 


A, 


Answer: 


A 2’  A 1 

10.  Show  that  if  {vj,  V2,  V3}  is  a linearly  independent  set  of  vectors,  then  so  are 
{vi,  V2)  , {▼!,  V3}  , {v2,  V3} , (vi)  , {v2}  ,and  {v3}  . 

11.  Show  that  if  S=  (vi,  v2, vr)  is  a linearly  independent  set  of  vectors,  then  so  is  every  nonempty 
subset  of  S. 


12.  Show  that  if  S'  = { vj , v2,  V3 } is  a linearly  dependent  set  of  vectors  in  a vector  space  V,  and  V4  is  any 
vector  in  V that  is  not  in  S,  then  (vi,  v2,  V3,  V4}  is  also  linearly  dependent. 

13.  Show  that  if  S'  = { vi , v2, . . vr } is  a linearly  dependent  set  of  vectors  in  a vector  space  V,  and  if 

...,  v„  are  any  vectors  in  V that  are  not  in  S,  then  {vi,  v2, ....  vr,  vr_|_i , ....  v„}  is  also  linearly 
dependent. 

14.  Show  that  in  P2  every  set  with  more  than  three  vectors  is  linearly  dependent. 

15.  Show  that  if  {vj,  v2}  is  linearly  independent  and  V3  does  not  lie  in  span  {vj,  v2}  , then  (vi,  v2,  V3}  is 
linearly  independent. 

16.  Prove:  For  any  vectors  u,  v,  and  w in  a vector  space  V,  the  vectors  u — v-  v — w-  and  w — u form  a 
linearly  dependent  set. 

17.  Prove:  The  space  spanned  by  two  vectors  in  is  a line  through  the  origin,  a plane  through  the  origin,  or 
the  origin  itself. 

18.  Under  what  conditions  is  a set  with  one  vector  linearly  independent? 

19.  Are  the  vectors  Vi,  v2,  and  V3  in  part  (a)  of  the  accompanying  figure  linearly  independent?  What  about 
those  in  part  ( b )?  Explain. 


Figure  Ex-19 


Answer: 


(a)  They  are  linearly  independent  since  vj,  V2,  and  V3  do  not  lie  in  the  same  plane  when  they  are  placed 
with  their  initial  points  at  the  origin. 

(b)  They  are  not  linearly  independent  since  vj,  V2,  and  V3  line  in  the  same  plane  when  they  are  placed 
with  their  initial  points  at  the  origin. 

20.  By  using  appropriate  identities,  where  required,  determine  which  of  the  following  sets  of  vectors  in 
F ( — 00,  00)  are  linearly  dependent. 

(a)  6,  3 sinzx,  2 cos2x 

(b)  x,  cos  x 

(c)  1,  sin  x,  sin  2x 

(d)  cos  2x,  sin  x,  cos  x 

(e)  (3-x)2,  x2-6x,  5 

(f)  0,  COS  5TX,  SUV  3xx 

21.  The  functions  / 1 (x)  = x and  f 2(x)  = cos  x are  linearly  independent  in  F(—  00,  00)  because  neither 
function  is  a scalar  multiple  of  the  other.  Confirm  the  linear  independence  using  Wronski's  test. 

Answer: 

l¥(x)  = — x sin  x — cos  x * 0 for  some  x. 

22.  The  functions  / 1 (x)  = sin  x and  / 2OO  = cos  x are  linearly  independent  in  F(  — oo,  00)  because 
neither  function  is  a scalar  multiple  of  the  other.  Confirm  the  linear  independence  using  Wronski's  test. 

23.  (Calculus  required)  Use  the  Wronskian  to  show  that  the  following  sets  of  vectors  are  linearly 
independent. 

(a)  1,  x,  ex 

(b)  1,  x,  x2 

Answer: 

(a)  w(x)  = ex*  0 

(b)  ^(*)  = 2*0 

24.  Show  that  the  functions  / 1 (x'j 

25.  Show  that  the  functions  f \{x) 

Answer: 

^(x)=2sinx*0  for  some  x. 

26.  Use  part  (a)  of  Theorem  4.3.1  to  prove  part  ( b ). 


= e * , / 2 (x ) = xe * , and  / 3 fx  j = x2ex  are  linearly  independent. 

= sin  x,  f 2(x)  = cos  x,  and  / 3(x)  = x cos  x are  linearly  independent. 


27.  Prove  part  ( b ) of  Theorem  4.3.2. 

28*  (a)  In  Example  1 we  showed  that  the  mutually  orthogonal  vectors  i,  j,  and  k form  a linearly  independent 
set  of  vectors  in  p^.  Do  you  think  that  every  set  of  three  nonzero  mutually  orthogonal  vectors  in  f;~‘  is 
linearly  independent?  Justify  your  conclusion  with  a geometric  argument. 

(b)  Justify  your  conclusion  with  an  algebraic  argument.  [Hint:  Use  dot  products.] 

True-False  Exercises 

In  parts  (a)-(h)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  A set  containing  a single  vector  is  linearly  independent. 

Answer: 

False 

(b)  The  set  of  vectors  { v,  £v)  is  linearly  dependent  for  every  scalar  k. 

Answer: 

True 

(c)  Every  linearly  dependent  set  contains  the  zero  vector. 

Answer: 

False 

(d)  If  the  set  of  vectors  {vj,  V2,  V3}  is  linearly  independent,  then  (Avi,  kvj,  ^3}  is  also  linearly 
independent  for  every  nonzero  scalar  k. 

Answer: 

True 

(e)  If  vj, ....  v„  are  linearly  dependent  nonzero  vectors,  then  at  least  one  vector  is  a unique  linear 
combination  of vi, ....  v^_j 

Answer: 

True 

(f)  The  set  of  2 x 2 matrices  that  contain  exactly  two  l's  and  two  0's  is  a linearly  independent  set  in  M 22- 

Answer: 

False 

(g)  The  three  polynomials  (A  — 1)  (A  + 2),  x(A  + 2),  and  x(x  — 1)  are  linearly  independent. 

Answer: 


True 


(h)  The  functions  f [ and  / 2 are  linearly  dependent  if  there  is  a real  number  x so  that 
k\f  1 (x)  + k'lf  2OO  = 0 for  some  scalars  and  k2- 

Answer: 

False 
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4.4  Coordinates  and  Basis 

We  usually  think  of  a line  as  being  one-dimensional,  a plane  as  two-dimensional,  and  the  space  around  us  as  three- 
dimensional.  It  is  the  primary  goal  of  this  section  and  the  next  to  make  this  intuitive  notion  of  dimension  precise. 
In  this  section  we  will  discuss  coordinate  systems  in  general  vector  spaces  and  lay  the  groundwork  for  a precise 
definition  of  dimension  in  the  next  section. 


Coordinate  Systems  in  Linear  Algebra 

In  analytic  geometry  we  learned  to  use  rectangular  coordinate  systems  to  create  a one-to-one  correspondence 
between  points  in  2-space  and  ordered  pairs  of  real  numbers  and  between  points  in  3 -space  and  ordered  triples  of 
real  numbers  (Figure  4.4.1).  Although  rectangular  coordinate  systems  are  common,  they  are  not  essential.  For 
example,  Figure  4.4.2  shows  coordinate  systems  in  2-space  and  3-space  in  which  the  coordinate  axes  are  not 
mutually  perpendicular. 


Coordinates  of  P in  a rectangular 
coordinate  system  in  2-space. 


Figure  4.4.1 


In  linear  algebra  coordinate  systems  are  commonly  specified  using  vectors  rather  than  coordinate  axes.  For 
example,  in  Figure  4.4.3  we  have  recreated  the  coordinate  systems  in  Figure  4.4.2  by  using  unit  vectors  to  identify 
the  positive  directions  and  then  attaching  coordinates  to  a point  P using  the  scalar  coefficients  in  the  equations 

OP  = au\  4 bu2  and  OP  = a\i\  4 b\i2  4 CU3 


P{a.b) 


~1 

P{a,  b , c) 


Figure  4.4.3 


Units  of  measurement  are  essential  ingredients  of  any  coordinate  system.  In  geometry  problems  one  tries  to  use 
the  same  unit  of  measurement  on  all  axes  to  avoid  distorting  the  shapes  of  figures.  This  is  less  important  in 
applications  where  coordinates  represent  physical  quantities  with  diverse  units  (for  example,  time  in  seconds  on 
one  axis  and  temperature  in  degrees  Celsius  on  another  axis).  To  allow  for  this  level  of  generality,  we  will  relax 
the  requirement  that  unit  vectors  be  used  to  identify  the  positive  directions  and  require  only  that  those  vectors  be 
linearly  independent.  We  will  refer  to  these  as  the  “basis  vectors”  for  the  coordinate  system,  hi  summary,  it  is  the 
directions  of  the  basis  vectors  that  establish  the  positive  directions,  and  it  is  the  lengths  of  the  basis  vectors  that 
establish  the  spacing  between  the  integer  points  on  the  axes  (Figure  4.4.4). 


Equal  spacing 

Unequal  spacing 

Equal  spacing 

Unequal  sp 

Perpendicular  axes 

Perpendicular  axes 

Skew  axes 

Skew  axes 

Figure  4.4.4 


Basis  for  a Vector  Space 

The  following  definition  will  make  the  preceding  ideas  more  precise  and  will  enable  us  to  extend  the  concept  of  a 
coordinate  system  to  general  vector  spaces. 

Note  that  in  Definition  1 we  have  required  a basis 
to  have  finitely  many  vectors.  Some  authors  call 
this  a finite  basis , but  we  will  not  use  this 
terminology. 


1 


DEFINITION  1 


If  V is  any  vector  space  and  S = { vj , V2, . . vn  } is  a finite  set  of  vectors  in  V,  then  S is  called  a basis  for 
V if  the  following  two  conditions  hold: 

(a)  S is  linearly  independent. 

(b)  S spans  V. 


If  you  think  of  a basis  as  describing  a coordinate  system  for  a vector  space  in  V,  then  part  (a)  of  this  definition 
guarantees  that  there  is  no  interrelationship  between  the  basis  vectors,  and  part  (b)  guarantees  that  there  are 
enough  basis  vectors  to  provide  coordinates  for  all  vectors  in  V.  Here  are  some  examples. 

EXAMPLE  1 The  Standard  Basis  for  Rn 


Recall  from  Example  1 1 of  Section  4.2  that  the  standard  unit  vectors 

©1  = (1,  0,  0, 0),  e2  = (0,  1,  0, 0), eM  = (0,  0,  0, 1) 

span  Rn  and  from  Example  1 of  Section  4.3  that  they  are  linearly  independent.  Thus,  they  form  a 
basis  for  Rn  that  we  call  the  standard  basis  for  Rn.  In  particular, 

i=  (1,0,0),  j=  (0,1,0),  k=  (0,0,1) 
is  the  standard  basis  for  R-'. 


EXAMPLE  2 The  Standard  Basis  for  Pn 

Show  that  S'  = |l,  is  a basis  for  the  vector  space  Pn  of  polynomials  of  degree  n or 

less. 

We  must  show  that  the  polynomials  in  S are  linearly  independent  and  span  Pn.  Let  us 
denote  these  polynomials  by 

P0  = 1>  Pi  =*>  P2  = x2 P»  = x” 

We  showed  in  Example  13  of  Section  4.2  that  these  vectors  span  Pn  and  in  Example  4 of  Section 
4.3  that  they  are  linearly  independent.  Thus,  they  form  a basis  for  Pn  that  we  call  the  standard  basis 

f°r?n 


EXAMPLE  3 Another  Basis  for  R3 

Show  that  the  vectors  vi  = (1,  2,  1),  V2  = (2,  9,  0),  and  V3  = (3,  3,  4)  form  a basis  lor  p^. 

We  must  show  that  these  vectors  are  linearly  independent  and  span  f,'-'.  To  prove  linear 
independence  we  must  show  that  the  vector  equation 


c ivi  +c  2V2  + c 3V3  = 0 


(1) 


has  only  the  trivial  solution;  and  to  prove  that  the  vectors  span  R*  we  must  show  that  every  vector 
b = (b\,  b%  63)  in  can  be  expressed  as 


Cl  vi  +C2V2  + C3V3  = b 


(2) 


By  equating  corresponding  components  on  the  two  sides,  these  two  equations  can  be  expressed  as 
the  linear  systems 


c 1 + 2^2  + 3c2  = 0 ci  + 2c2  + 3c3  = b\ 

2c  1 + 9c2  4-  3c3  = 0 and  2c  1 + 9c2  + 3c3  = 62  (3) 

ci  +4c3  = 0 ci  +4c3  = &3 


(verify).  Thus,  we  have  reduced  the  problem  to  showing  that  in  3 the  homogeneous  system  has  only 
the  trivial  solution  and  that  the  nonhomogeneous  system  is  consistent  for  all  values  of  b\,  &2>  and  ^3 
. But  the  two  systems  have  the  same  coefficient  matrix 


A = 


1 

2 

1 


2 3 
9 3 
0 4 


so  it  follows  from  parts  ( b ),  (e),  and  (g)  of  Theorem  2.3.8  that  we  can  prove  both  results  at  the  same 
time  by  showing  that  det(^4)  * 0.  We  leave  it  for  you  to  confirm  that  det(-d)  = — 1,  which  proves 
that  the  vectors  vi , V2,  and  V3  form  a basis  for  r}. 


EXAMPLE  4 The  Standard  Basis  for  Mmn 

Show  that  the  matrices 


r 1 01 

'o  r 

C 

<Z 

fn  o~l 

1 — 

0 • 

0 < 

1 

* 

to 

II 

_°  0_ 

, Mi  = 

1 

O < 

1 

II 

> 0 

1 

form  a basis  for  the  vector  space  M22  °f  2 x 2 matrices. 


We  must  show  that  the  matrices  are  linearly  independent  and  span  M22*  To  prove  linear 
independence  we  must  show  that  the  equation 


c\M\  +C2M2  +C3M3  + c^M4  = 0 


(4) 


has  only  the  trivial  solution,  where  0 is  the  2x2  zero  matrix;  and  to  prove  that  the  matrices  span 
M22  we  must  show  that  every  2x2  matrix 


B = 


a 

c 


b 

d 


can  be  expressed  as 


ci  Mi  +C2M2  + C3M3  + cqM4  = 5 


(5) 


The  matrix  forms  of  Equations  4 and  5 are 


1 0 

'0  r 

'0  o' 

'0  o' 

'0  o' 

Cl 

0 0 

+ C2 

0 0 

+ C3 

1 0 

+ C 4 

0 1 

0 0 

1 0 

0 1 

0 0 

0 0 

a b 

Cl 

0 0 

+ c 2 

0 0 

+ C2 

1 0 

+ C4 

0 1 

c d 

which  can  be  rewritten  as 


a b 
c d 

Since  the  first  equation  has  only  the  trivial  solution 

c\  = C2  = C2  = C4  = 0 


ci  c 2 
c 3 C4 


0 0 
0 0 


and 


ci  c 2 

c3  c4 


the  matrices  are  linearly  independent,  and  since  the  second  equation  has  the  solution 

c\=a,  C2  = b,  C3  = c,  04  = d 


the  matrices  span  M 22-  This  proves  that  the  matrices  M 2,  M 3,  M 4 form  a basis  for  M 22- 

More  generally,  the  mzz  different  matrices  whose  entries  are  zero  except  for  a single  entry  of  1 form 
a basis  for  Mmn  called  the  standard  basis  for  Mmn . 


Some  writers  define  the  empty  set  to  be  a basis 
for  the  zero  vector  space,  but  we  will  not  do  so. 


It  is  not  true  that  every  vector  space  has  a basis  in  the  sense  of  Definition  1 . The  simplest  example  is  the  zero 
vector  space,  which  contains  no  linearly  independent  sets  and  hence  no  basis.  The  following  is  an  example  of  a 
nonzero  vector  space  that  has  no  basis  in  the  sense  of  Definition  1 because  it  cannot  be  spanned  by  finitely  many 
vectors. 

EXAMPLE  5 A Vector  Space  That  Has  No  Finite  Spanning  Set 

Show  that  the  vector  space  of  P x of  all  polynomials  with  real  coefficients  has  no  finite  spanning  set. 

If  there  were  a finite  spanning  set,  say  S=  {p  1 , P2>  - • P r)  •>  then  the  degrees  of  the 
polynomials  in  S would  have  a maximum  value,  say  n\  and  this  in  turn  would  imply  that  any  linear 
combination  of  the  polynomials  in  S would  have  degree  at  most  n.  Thus,  there  would  be  no  way  to 
express  the  polynomial  *”+1  as  a linear  combination  of  the  polynomials  in  S , contradicting  the  fact  that 
the  vectors  in  S span  P x . 


For  reasons  that  will  become  clear  shortly,  a vector  space  that  cannot  be  spanned  by  finitely  many  vectors  is  said 
to  be  infinite-dimensional , whereas  those  that  can  are  said  to  be  finite-dimensional. 


EXAMPLE  6 Some  Finite-and  Infinite-Dimensional  Spaces 


In  Example  1,  Example  2,  and  Example  4 we  found  bases  for  R ”,  Pn , and  Mmn,  so  these  vector 
spaces  are  finite-dimensional.  We  showed  in  Example  5 that  the  vector  space  P^  is  not  spanned  by 
finitely  many  vectors  and  hence  is  infinite-dimensional.  In  the  exercises  of  this  section  and  the  next 
we  will  ask  you  to  show  that  the  vector  spaces  R F ( — oo,  oo) , C(  — oo,  oo) , Cm(- 00,00),  and 

C ^ ( — oo,  oo)  are  infinite-dimensional. 


Coordinates  Relative  to  a Basis 

Earlier  in  this  section  we  drew  an  informal  analogy  between  basis  vectors  and  coordinate  systems.  Our  next  goal  is 
to  make  this  informal  idea  precise  by  defining  the  notion  of  a coordinate  system  in  a general  vector  space.  The 
following  theorem  will  be  our  first  step  in  that  direction. 


Uniqueness  of  Basis  Representation 

If  S = { vi , V2, . . v„ ) is  a basis  for  a vector  space  V,  then  every  vector  v in  V can  be  expressed  in  the 
form  v = 4-  C2V2  + • ■ ■ 4-  cn\n  in  exactly  one  way. 


Since  S spans  V,  it  follows  from  the  definition  of  a spanning  set  that  every  vector  in  V is  expressible  as  a 
linear  combination  of  the  vectors  in  S.  To  see  that  there  is  only  one  way  to  express  a vector  as  a linear  combination 
of  the  vectors  in  S , suppose  that  some  vector  v can  be  written  as 


and  also  as 


v = civi  +C2V2+  * ’ ’ ^cnvn 


v = £ivi  4-  &2v2  + ’ ’ ' +knvn 
Subtracting  the  second  equation  from  the  first  gives 

0 = (ci—  *l)vi + (c2  — *2)*2+  ■ ■ • 

Since  the  right  side  of  this  equation  is  a linear  combination  of  vectors  in  S,  the  linear  independence  of  S implies 
that 


ci- *1  = 0.  C2-k2  = 0,...,  Cn-kn  = 0 


that  is. 


^1  ^ 1 > C2  ^2  > » C Yl  kyi 

Thus,  the  two  expressions  for  v are  the  same. 


Figure  4.4.5 


Sometimes  it  will  be  desirable  to  write  a 
coordinate  vector  as  a column  matrix,  in  which 
case  we  will  denote  it  using  square  brackets  as 


cn 


We  will  refer  to  [ v]  £ as  a coordinate  matrix  and 
reserve  the  terminology  coordinate  vector  for  the 
comma  delimited  form  (v)  £. 


We  now  have  all  of  the  ingredients  required  to  define  the  notion  of  “coordinates”  in  a general  vector  space  V.  For 
motivation,  observe  that  in  for  example,  the  coordinates  (a,  b,  c ) of  a vector  v are  precisely  the  coefficients  in 
the  formula 

v = ai  + b j + ck 

that  expresses  v as  a linear  combination  of  the  standard  basis  vectors  for  R1'  (see  Figure  4.4.5).  The  following 
definition  generalizes  this  idea. 


DEFINITION  2 

If  S = { vi , v2,  - - v„ ) is  a basis  for  a vector  space  V,  and 

v = civi+c2v2+  • • • +cnvn 

is  the  expression  for  a vector  v in  terms  of  the  basis  S,  then  the  scalars  c \ , c2, . . cn  are  called  the 
coordinates  of  v relative  to  the  basis  S.  The  vector  (c c2, cn)  in  Rn  constructed  from  these 
coordinates  is  called  the  coordinate  vector  of  v relative  to  S ; it  is  denoted  by 

(y)s=(.ci,C2 Cn)  (6) 


J 


Recall  that  two  sets  are  considered  to  be  the  same  if  they  have  the  same  members,  even  if  those 


members  are  written  in  a different  order.  However,  if  S = { vj , V2,  - - } is  a set  of  basis  vectors , then  changing 
the  order  in  which  the  vectors  are  written  would  change  the  order  of  the  entries  in  (v)  £,  possibly  producing  a 
different  coordinate  vector.  To  avoid  this  complication,  we  will  make  the  convention  that  in  any  discussion 
involving  a basis  S the  order  of  the  vectors  in  S remains  fixed.  Some  authors  call  a set  of  basis  vectors  with  this 
restriction  an  ordered  basis.  However,  we  will  use  this  terminology  only  when  emphasis  on  the  order  is  required 
for  clarity. 


Observe  that  (v)^  is  a vector  in  Rn,  so  that  once  basis  S is  given  for  a vector  space  V,  Theorem  4.4.1  establishes  a 
one-to-one  correspondence  between  vectors  in  V and  vectors  in  Rn  (Figure  4.4.6). 

A one-to-one  correspondence 


V <V)S 

V Rn 

Figure  4.4.6 


EXAMPLE  7 Coordinates  Relative  to  the  Standard  Basis  for  Rn 

In  the  special  case  where  V = Rn  and  S is  the  standard  basis , the  coordinate  vector  (v)  £ and  the  vector 
v are  the  same;  that  is, 

V=  (v)s 

For  example,  in  the  representation  of  a vector  v = (a,  b,  c ) as  a linear  combination  of  the  vectors  in 
the  standard  basis  S = is 

v = ai  4-  b j + ck 

so  the  coordinate  vector  relative  to  this  basis  is  (v)^=  (a,  b,  c),  which  is  the  same  as  the  vector  v. 


EXAMPLE  8 Coordinate  Vectors  Relative  to  Standard  Bases 


Find  the  coordinate  vector  for  the  polynomial 

=cq  + c\x  +C2X2  + ’ • • +c„x” 


relative  to  the  standard  basis  for  the  vector  space  Pn. 
Find  the  coordinate  vector  of 


B = 


a 

c 


b 

d 


relative  to  the  standard  basis  for  M 22- 


Solution 

The  given  formula  for  p(x)  expresses  this  polynomial  as  a linear  combination  of  the  standard 
basis  vectors  S=  < 1,  x,  xn  j . Thus,  the  coordinate  vector  for  p relative  to  S is 


(p)s-=  i.CQ,c\,C2,...,cn) 

We  showed  in  Example  4 that  the  representation  of  a vector 


B = 


a b 
c d 


as  a linear  combination  of  the  standard  basis  vectors  is 


B = 


a b 
c d 


= a 


1 0 
0 0 


+ b 


0 1 
0 0 


+ c 


0 0 
1 0 


0 0 
0 1 


so  the  coordinate  vector  of  B relative  to  S is 

(B)s=(a,b,c,d) 


EXAMPLE  9 Coordinates  in  R3 

We  showed  in  Example  3 that  the  vectors 

vi  = (1,2,1),  v2  =(2,9,0),  v3  = (3,  3, 4) 
form  a basis  for  R-'.  Find  the  coordinate  vector  ofv=(5,  — 1,9)  relative  to  the  basis 
s=  {vi,  V2,  v3)  . 

Find  the  vector  v in  whose  coordinate  vector  relative  to  S is  (v) g = ( — 1,  3,  2). 

Solution 

To  find  (v)  £ we  must  first  express  v as  a linear  combination  of  the  vectors  in  S ; that  is,  we  must 
find  values  of  c i , c2,  and  c 3 such  that 

v = civi  + c2v2  + c3v3 

or,  in  terms  of  components, 

(5,  - 1,  9)  =C1(1,  2,  1)  +c2(2,  9,  0)  +c3(3,  3, 4) 

Equating  corresponding  components  gives 

c\  + 2c2  + 3c2  = 5 

2c  \ + 9c2  + 3c3  = —1 

c\  +4c3  = 9 

Solving  this  system  we  obtain  c\  = l,c2=  — 1 9 c3  = 2 (verify).  Therefore, 

(▼)*=(  1.  -1.2) 

Using  the  definition  of  (v)  5,  we  obtain 

v = ( — l)vi  + 3v2  + 2v3 

= ( - 1)(1,  2,  1)  + 3(2,  9,  0)  + 2(3,  3,  4)  = (11,  31,  7) 


Concept  Review 

Basis 

Standard  bases  for  Rn,Pn,  Mmn 
F inite-dimensional 
Infinite-dimensional 
Coordinates 
Coordinate  vector 

Skills 

Show  that  a set  of  vectors  is  a basis  for  a vector  space. 
Find  the  coordinates  of  a vector  relative  to  a basis. 

Find  the  coordinate  vector  of  a vector  relative  to  a basis. 


Exercise  Set  4.4 


1.  In  words,  explain  why  the  following  sets  of  vectors  are  not  bases  for  the  indicated  vector  spaces. 

(a)  U!  = (1,  2),  ii2  = (0.  3),U3  = (2,7)  for/?2 

(b)  ui  = ( — 1,  3,  2),U2=  (6,  1,  1)  for/?3 


(c)  p i = 1 + x 4-  X > P2  — x ~ 1 f°r 
®A  = 


1 1 

2 3 


B = 


6 0 

-1  4 


C = 


3 0 
1 7 


,D  = 


5 1 
4 2 


,E  = 


1 1 

2 9 


, for  M22 


Answer: 

(a)  A basis  for  /?2  has  two  linearly  independent  vectors. 

(b)  A basis  for  f’~'  has  three  linearly  independent  vectors. 

(c)  A basis  for  P2  has  three  linearly  independent  vectors. 

(d)  A basis  for  Mji  has  four  linearly  independent  vectors. 

2.  Which  of  the  following  sets  of  vectors  are  bases  for  /?2? 

(a)  ((2,1),  (3,0)) 

(b)  ((4,1),  (-7,  -8)} 

(c)  ((0,0),  (1,3)) 

(d)  ((3.  9),  (-4,  -12)} 

3.  Which  of  the  following  sets  of  vectors  are  bases  for  /?-'? 

(a)  ((1,0,0),  (2,  2,0),  (3,  3,  3)} 

(b)  ((3,1,  -4),  (2,  5,  6),  (1,4,  8)} 

(c)  ((2.  — 3,  1),  (4,  1,  1),  (0,  -7,1)} 


(d)  {(1,6,4),  (2,4,  -1),  (-1,2,5)} 


Answer: 

(a),(b) 

4.  Which  of  the  following  form  bases  for  P2I 

(a)  \-3x  + 2x2,  \+x  + 4x2,  1 -lx 

(b)  4 + 6x  + x2,  — 1 + 4;r  4-  2x2,  5 + 2x-x2 

(c)  l+x  + x2,  x + x2,  x2 

(d)  -4  + x + 3x2,  6 -F  5x  + 2x2,  S + 4x  + x2 

5.  Show  that  the  following  matrices  form  a basis  for  M22* 


6.  Let  Fbe  the  space  spanned  by  Vl  = Cos2x > \?2  = sin2x>  V3  = cos  2x. 

(a)  Show  that  S = { vj , V2,  V3  } is  not  a basis  for  V. 

(b)  Find  a basis  for  V. 

7.  Find  the  coordinate  vector  of  w relative  to  the  basis  S = {\i\ , 112 } for  R1. 

(a)  ui  = (1,  0),  u2  = (0,  1);  w=  (3,  -7) 

(b)  ui  = (2,  — 4),  u2  = (3,  8);  w = (1,  1) 

(c)  U!  = (1,  1),  u2  = (0,  2);  w=  (a,  b) 

Answer: 


(a)  (w)s=(3,  -7) 


8.  Find  the  coordinate  vector  of  w relative  to  the  basis  S=  {uj , U2 } of/JJ. 

(a)  u1  = (l,  — 1),  u2  = (1,  1);  w=  (1,  0) 

(b)  u1  = (l,  — 1),  u2  = (1, 1);  w=  (0, 1) 

(c)  u1  = (l,  — l),u2  = (l,  l);w=(l,  1) 

9.  Find  the  coordinate  vector  of  v relative  to  the  basis  S = {vi,  v2,  V3}  . 

(a)  v=  (2,  — 1,  3);  vi  = (1,  0,  0),  v2  = (2,  2,  0),  v3  = (3,  3,  3) 

(b)  v = (5,  — 12,  3);  vi  = (1,  2,  3),  v2  = ( — 4,  5,  6),  v3  = (7,  -8,9) 

Answer: 


(a)  (v)5=(3,  -2,1) 

(b)  (v)5=(-2,  0,  1) 


10.  Find  the  coordinate  vector  of  p relative  to  the  basis  S=  (pi.P2.P3)- 

(a)  p = 4 — 3x  I x2;  pi  = 1,  P2  = x,  p3  = 

(b)  p = 2 — x 1 x2;  pi  = 1 I x,  p2  = 1 -I-  x2,  P3  = x 4-  xA 


11.  Find  the  coordinate  vector  of  A relative  to  the  basis  S=  [A\ , Aj,  A^,  A4)  . 


2 0 . 
-1  3 ’ 


-1  1 
0 0 ’ 


^3  = 


Answer: 

(^=(-1,1,  -1,3) 

In  Exercises  12-13,  show  that  {A\ , Aj,  A2,  A4}  is  a basis  for  M 22,  and  express  A as  a linear  combination  of  the 
basis  vectors. 


Answer: 

A = A\  — A2  + Aj  — A4 

In  Exercises  14-15,  show  that  (P1.P2.P3)  isa  basis  for  Pj , and  express  p as  a linear  combination  of  the  basis 
vectors. 

14.  pj  = 1 + 2x  + x2>  P2  = 2 + 9x,  p3  = 3 + 3x  + 4x2;  p = 2 + 17x  — 3x2 
15-  pi  = 1 + x + x2>  P2  = x 4-  x2’  P3  = x2;  p = 7 — x I 2x" 

Answer: 

P = 7pi  -8p2  + 3p3 

16.  The  accompanying  figure  shows  a rectangular  xy-coordinate  system  and  an  x'_y  -coordinate  system  with 
skewed  axes.  Assuming  that  1-unit  scales  are  used  on  all  the  axes,  find  the  x!y' -coordinates  of  the  points 
whose  xy-coordinates  are  given. 

(a)  (1,  1) 

(b)  (1,0) 

(c)  (0,  1) 

(d)  (ab) 


17.  The  accompanying  figure  shows  a rectangular  xy-coordinate  system  determined  by  the  unit  basis  vectors  i and 
j and  an  xfy  '-coordinate  system  determined  by  unit  basis  vectors  uj  and  112.  Find  the  xfy ' -coordinates  of  the 
points  whose  xy-coordinates  are  given. 

(a)  (/3.1) 

(b)  (1,  0) 

(c)  (0,  1) 

(d)  {a,  b ) 

A y and  y* 


x' 


Figure  Ex-17 


Answer: 


(a)  (2,  0) 

(b)  f_2_  _J_'l 

'ft} 

(c)  (0,  1) 


(d)[ha’b-fc  J 


18.  The  basis  that  we  gave  for  M 22  in  Example  4 consisted  of  noninvertible  matrices.  Do  you  think  that  there  is  a 
basis  for  M 22  consisting  of  invertible  matrices?  Justify  your  answer. 

19.  Prove  that  R 30  is  infinite-dimensional. 


True-False  Exercises 


In  parts  (a)-(e)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer, 
(a)  If  V = span{vi, v„}  , then  {vi, v„}  is  a basis  for  V. 


Answer: 


False 

(b)  Every  linearly  independent  subset  of  a vector  space  V is  a basis  for  V. 

Answer: 

False 

(c)  If  { vi , V2 , . } is  a basis  for  a vector  space  K,  then  every  vector  in  V can  be  expressed  as  a linear 

combination  of  v\,  V2,  v„ 

Answer: 

True 

(d)  The  coordinate  vector  of  a vector  x in  Rn  relative  to  the  standard  basis  for  Rn  is  x. 

Answer: 

True 

(e)  Every  basis  of  P4  contains  at  least  one  polynomial  of  degree  3 or  less. 

Answer: 

False 
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4.5  Dimension 

We  showed  in  the  previous  section  that  the  standard  basis  Rn  has  n vectors  and  hence  that  the  standard  basis 

for  p}  has  three  vectors,  the  standard  basis  for  £2  has  two  vectors,  and  the  standard  basis  for  R 1 ( = R j has  one 

vector.  Since  we  think  of  space  as  three  dimensional,  a plane  as  two  dimensional,  and  a line  as  one 
dimensional,  there  seems  to  be  a link  between  the  number  of  vectors  in  a basis  and  the  dimension  of  a vector 
space.  We  will  develop  this  idea  in  this  section. 


Number  of  Vectors  in  a Basis 

Our  first  goal  in  this  section  is  to  establish  the  following  fundamental  theorem. 


THEOREM  4.5.1 

All  bases  for  a finite-dimensional  vector  space  have  the  same  number  of  vectors. 


To  prove  this  theorem  we  will  need  the  following  preliminary  result,  whose  proof  is  deferred  to  the  end  of  the 
section. 


THEOREM  4.5.2 

Let  Fbe  a finite-dimensional  vector  space,  and  let  {vi,  V2 vM}  be  any  basis. 

(a)  If  a set  has  more  than  n vectors,  then  it  is  linearly  dependent. 

(b)  If  a set  has  fewer  than  n vectors,  then  it  does  not  span  V. 


Some  writers  regard  the  empty  set  to  be  a basis 
for  the  zero  vector  space.  This  is  consistent  with 
our  definition  of  dimension,  since  the  empty  set 
has  no  vectors  and  the  zero  vector  space  has 
dimension  zero. 


We  can  now  see  rather  easily  why  Theorem  4.5.1  is  true;  for  if 

S=  {vi,  v2,.~,  v„} 

is  an  arbitrary  basis  for  V,  then  the  linear  independence  of  S implies  that  any  set  in  V with  more  than  n vectors 
is  linearly  dependent  and  any  set  in  V with  fewer  than  n vectors  does  not  span  V.  Thus,  unless  a set  in  Lhas 
exactly  n vectors  it  cannot  be  a basis. 


We  noted  in  the  introduction  to  this  section  that  for  certain  familiar  vector  spaces  the  intuitive  notion  of 
dimension  coincides  with  the  number  of  vectors  in  a basis.  The  following  definition  makes  this  idea  precise. 

Engineers  often  use  the  term  degrees  of 
freedom  as  a synonym  for  dimension. 


r 


DEFINITION  1 

The  dimension  of  a finite-dimensional  vector  space  V is  denoted  by  dim(F)  and  is  defined  to  be  the 
number  of  vectors  in  a basis  for  V.  In  addition,  the  zero  vector  space  is  defined  to  have  dimension  zero. 


EXAMPLE  1 


Dimensions  of  Some  Familiar  Vector  Spaces 


dim(£”)  = * 
dim(PM)  =n  + 1 
dim(MmM)  = mn 


The  standard  basis  has  n vectors. 

The  standard  basis  has  n + 1 vectors. 
The  standard  basis  has  mn  vectors. 


EXAMPLE  2 Dimension  of  Span(S) 

IfS'=  {vi,  v2,...,  vr)  is  a linearly  independent  set  in  a vector  space  V,  then  S is  automatically 
a basis  for  span(S')  (why?),  and  this  implies  that 

dim[span(£)  ] =r 

In  words,  the  dimension  of  the  space  spanned  by  a linearly  independent  set  of  vectors  is  equal  to 
the  number  of  vectors  in  that  set. 


EXAMPLE  3 Dimension  of  a Solution  Space 

Find  a basis  for  and  the  dimension  of  the  solution  space  of  the  homogeneous  system 

2x\  + 2x2  — x3  +*5  = 0 
—xi~X2  + 2x3  ~ 3x4  + x5  = 0 
*1+X2  — 2x3  — x;  — 0 

X3  + X4  + xs  = 0 

We  leave  it  for  you  to  solve  this  system  by  Gauss- Jordan  elimination  and  show  that 
its  general  solution  is 

xq  = — s — t,  X2  = s,  X3  = —t,  X4  = 0,  x^  — t 


which  can  be  written  in  vector  form  as 

(x\,X2,x3,X4,X5)  = (-s-t,s, 

or,  alternatively,  as 

(xi,X2,X2,X4,xs)  =s(-  1,  1,  0,  0,  0)  +*(-  1,  0,  - 1,  0,  1) 

This  shows  that  the  vectors  vj  = ( — 1,  1,  0,  0,  0)  and  V2  = ( — 1,  0,  — 1,  0,  1)  span  the 
solution  space.  Since  neither  vector  is  a scalar  multiple  of  the  other,  they  are  linearly  independent 
and  hence  form  a basis  for  the  solution  space.  Thus,  the  solution  space  has  dimension  2. 


EXAMPLE  4 Dimension  of  a Solution  Space 

Find  a basis  for  and  the  dimension  of  the  solution  space  of  the  homogeneous  system 

*1  + 3x2“  2*3  +2x5  =0 

2x  i + 6x2  “ 5x3  “ 2x4  + 4*5  “ 3*6  = 0 
5x3  + 10x4  +15x6  = 0 

2xi+6x2  +8x4  + 4x5  + 18x6  = 0 


In  Example  6 of  Section  1 .2  we  found  the  solution  of  this  system  to  be 
x\=  — 2r — As — 2t,  X2  = r,  X3  = — 2 s,  X4  = s,  X5  = t,  x$  = 0 
which  can  be  written  in  vector  form  as 

(x1.x2.x3,  X4,  X5,  X6>  = ( - 3r-As-  2t,  r,  -2 s,s,t,  0) 
or,  alternatively,  as 

(xi,X2,  X3,  X4,  X5)  =r(-  3,  1,  0,  0,  0,  0)  + s(-4,  0,  -2,  1,  0,  0)  +t(-2,  0,  0,  0,  1,  0) 


This  shows  that  the  vectors 

v1  = (-3,  1,0,  0,0,0),  v2  = ( — 4,  0,  -2,  1,0,0), 


v3  = ( — 2,  0,  0,  0,1,0) 


span  the  solution  space.  We  leave  it  for  you  to  check  that  these  vectors  are  linearly  independent 
by  showing  that  none  of  them  is  a linear  combination  of  the  other  two  (but  see  the  remark  that 
follows).  Thus,  the  solution  space  has  dimension  3. 


It  can  be  shown  that  for  a homogeneous  linear  system,  the  method  of  the  last  example  always 
produces  a basis  for  the  solution  space  of  the  system.  We  omit  the  formal  proof. 


Some  Fundamental  Theorems 

We  will  devote  the  remainder  of  this  section  to  a series  of  theorems  that  reveal  the  subtle  interrelationships 
among  the  concepts  of  linear  independence,  basis,  and  dimension.  These  theorems  are  not  simply  exercises  in 
mathematical  theory — they  are  essential  to  the  understanding  of  vector  spaces  and  the  applications  that  build 
on  them. 


We  will  start  with  a theorem  (proved  at  the  end  of  this  section)  that  is  concerned  with  the  effect  on  linear 
independence  and  spanning  if  a vector  is  added  to  or  removed  from  a given  nonempty  set  of  vectors. 
Informally  stated,  if  you  start  with  a linearly  independent  set  S and  adjoin  to  it  a vector  that  is  not  a linear 
combination  of  those  in  S , then  the  enlarged  set  will  still  be  linearly  independent.  Also,  if  you  start  with  a set  S 
of  two  or  more  vectors  in  which  one  of  the  vectors  is  a linear  combination  of  the  others,  then  that  vector  can  be 
removed  from  S without  affecting  span(S)  (Figure  4.5.1). 


The  vector  outside  the  plane 
can  be  adjoined  to  the  other 
two  without  affecting  their 
linear  independence. 


Any  of  the  vectors  can 
be  removed,  and  the 
remaining  two  will  still 
span  the  plane. 


Either  of  the  collinear 
vectors  can  be  removed, 
and  the  remaining  two 
will  still  span  the  plane. 


Figure  4.5.1 


Plus/Minus  Theorem 

Let  S be  a nonempty  set  of  vectors  in  a vector  space  V. 

(a)  If  S is  a linearly  independent  set,  and  if  v is  a vector  in  V that  is  outside  of  span(£) , then  the  set 
S U { v)  that  results  by  inserting  v into  S is  still  linearly  independent. 

(b)  If  v is  a vector  in  S that  is  expressible  as  a linear  combination  of  other  vectors  in  S , and  if  S — { v) 
denotes  the  set  obtained  by  removing  v from  S , then  S — { v)  span  the  same  space;  that  is, 

span(£)  = span(£  — (v) ) 


EXAMPLE  5 Applying  the  Plus/Minus  Theorem 

Show  that  pj  — ] = and  are  linearly  independent  vectors. 

The  set  S = {p  i , P2  } is  linearly  independent,  since  neither  vector  in  S is  a scalar 
multiple  of  the  other.  Since  the  vector  P3  cannot  be  expressed  as  a linear  combination  of  the 
vectors  in  S (why?),  it  can  be  adjoined  to  S to  produce  a linearly  independent  set 
£"  = {P1.P2.P3}- 


In  general,  to  show  that  a set  of  vectors  { vi , V2,  - - } is  a basis  for  a vector  space  V,  we  must  show  that  the 

vectors  are  linearly  independent  and  span  V.  However,  if  we  happen  to  know  that  Fhas  dimension  n (so  that 
{ vi , V2, . . vM  } contains  the  right  number  of  vectors  for  a basis),  then  it  suffices  to  check  either  linear 


independence  or  spanning — the  remaining  condition  will  hold  automatically.  This  is  the  content  of  the 
following  theorem. 


THEOREM  4.5.4 

Let  Fbe  an  ^-dimensional  vector  space,  and  let  S be  a set  in  V with  exactly  n vectors.  Then  S is  a basis 
for  V if  and  only  if  S spans  V or  S is  linearly  independent. 


Assume  that  S has  exactly  n vectors  and  spans  V.  To  prove  that  S is  a basis,  we  must  show  that  S is  a 
linearly  independent  set.  But  if  this  is  not  so,  then  some  vector  v in  S is  a linear  combination  of  the  remaining 
vectors.  If  we  remove  this  vector  from  S , then  it  follows  from  Theorem  4.53b  that  the  remaining  set  of  ^ ] 

vectors  still  spans  V.  But  this  is  impossible,  since  it  follows  from  Theorem  4.5.2 b that  no  set  with  fewer  than  n 
vectors  can  span  an  ^-dimensional  vector  space.  Thus  S is  linearly  independent. 

Assume  that  S has  exactly  n vectors  and  is  a linearly  independent  set.  To  prove  that  S is  a basis,  we  must  show 
that  S spans  V.  But  if  this  is  not  so,  then  there  is  some  vector  v in  V that  is  not  in  span (S) . If  we  insert  this 
vector  into  S , then  it  follows  from  Theorem  4.5.3a  that  this  set  of  ^ ) 1 vectors  is  still  linearly  independent. 
But  this  is  impossible,  since  Theorem  4.5.2a  states  that  no  set  with  more  than  n vectors  in  an  a-dimensional 
vector  space  can  be  linearly  independent.  Thus  S spans  V. 

EXAMPLE  6 Bases  by  Inspection 

By  inspection,  explain  why  vj  = ( — 3,  7)  and  V2  = (5,  5)  form  a basis  for  p}. 

) By  inspection,  explain  why  vj  = (2,  0,  — 1),  V2  = (4,  0,  7),  and  V3  = ( — 1,  1,  4)  form  a 
basis  for 

Solution 

Since  neither  vector  is  a scalar  multiple  of  the  other,  the  two  vectors  form  a linearly 
independent  set  in  the  two-dimensional  space  p},  and  hence  they  form  a basis  by  Theorem 
4.5.4. 

The  vectors  v 1 and  V2  form  a linearly  independent  set  in  the  xz-plane  (why?).  The  vector  V3 
is  outside  of  the  xz-plane,  so  the  set  (vj , V2,  V3}  is  also  linearly  independent.  Since  £>-'  is 
three-dimensional,  Theorem  4.5.4  implies  that  (vi,  V2,  V3}  is  a basis  for  p/. 


The  next  theorem  (whose  proof  is  deferred  to  the  end  of  this  section)  reveals  two  important  facts  about  the 
vectors  in  a finite-dimensional  vector  space  V: 

Every  spanning  set  for  a subspace  is  either  a basis  for  that  subspace  or  has  a basis  as  a subset. 

Every  linearly  independent  set  in  a subspace  is  either  a basis  for  that  subspace  or  can  be  extended  to  a basis 
for  it. 


THEOREM  4.5.5 


Let  S be  a finite  set  of  vectors  in  a finite-dimensional  vector  space  V. 

(a)  If  S spans  Fbut  is  not  a basis  for  V,  then  S can  be  reduced  to  a basis  for  Fby  removing  appropriate 
vectors  from  S. 

(b)  If  S is  a linearly  independent  set  that  is  not  already  a basis  for  V,  then  S can  be  enlarged  to  a basis 
for  Vby  inserting  appropriate  vectors  into  S. 


We  conclude  this  section  with  a theorem  that  relates  the  dimension  of  a vector  space  to  the  dimensions  of  its 
subspaces. 


THEOREM  4.5.6 

If  IF  is  a subspace  of  a finite-dimensional  vector  space  V,  then: 

(a)  W is  finite-dimensional. 

(b)  dim(JT)  <dim(^). 

(c)  W=V  if  and  only  if  dim(^  = dim(^) . 


We  will  leave  the  proof  of  this  part  for  the  exercises. 

Proof  (b)  Part  (a)  shows  that  W is  finite-dimensional,  so  it  has  a basis 

S=  {wi,w2,...,wm} 

Either  S is  also  a basis  for  V or  it  is  not.  If  so,  then  dim(f/’)  = m,  which  means  that  dim^)  = dim(^F) . Ifnot, 
then  because  S is  a linearly  independent  set  it  can  be  enlarged  to  a basis  for  Eby  part  ( b ) of  Theorem  4.5.5.  But 
this  implies  that  dim(H0  < dim(P’),  so  we  have  shown  that  dim(f^  < dim(P’)  in  all  cases. 

Assume  that  dirn(fT’)  = dim(P’)  and  that 

S=  {wi,w2,...,wm} 

is  a basis  for  W.  If  S is  not  also  a basis  for  V,  then  being  linearly  independent  S can  be  extended  to  a basis  for  V 
by  part  ( b ) of  Theorem  4.5.5.  But  this  would  mean  that  dimf^)  > dirn(PF),  which  contradicts  our  hypothesis. 
Thus  S must  also  be  a basis  for  V,  which  means  that  dim($0  = dim(f/r) . 


Figure  4.5.2  illustrates  the  geometric  relationship  between  the  subspaces  of  R-'  in  order  of  increasing 
dimension. 


OPTIONAL 

We  conclude  this  section  with  optional  proofs  of  Theorem  4.5.2,  Theorem  4.5.3,  and  Theorem  4.5.5. 

Let  S*  = /wj , W2, ...,  wm  j=  be  any  set  of  m vectors  in  V,  where  m>n-  We 
want  to  show  that  S'  is  linearly  dependent.  Since  S = {vi,  v2, . . vn } is  a basis,  each  w,  can  be  expressed  as  a 
linear  combination  of  the  vectors  in  S,  say 

W1  = « 11  vi  +«21V2  + • ' • 

W2  = «12V1  + «22V2+  • • • 

wm  = aimV!  + a2mv2  + • • ‘ +<*nmV„ 

To  show  that  S'  is  linearly  dependent,  we  must  find  scalars  k2,  km,  not  all  zero,  such  that 

fcfwi  + ^2W2  + ‘ • • + kmwm  = 0 (2) 

Using  the  equations  in  1,  we  can  rewrite  2 as 

ii  +*2tfi2+  • * • +*m<Jlm)vi 
+ (*1«21  + k2a22  + ■ • • + kma2m)\2 

+ (kia„i+k2a„2+  ‘ ■ ■ +^nm)vM  = 0 

Thus,  from  the  linear  independence  of  S,  the  problem  of  proving  that  S'  is  a linearly  dependent  set  reduces  to 
showing  there  are  scalars  Aq,  k2, km,  not  all  zero,  that  satisfy 

a\\k\+a\2k2-k*  • • • 4-«im^m  = 0 

«21*l+«22*2+  • • • + <*2mkm  = 0 ^ 

an\k\  ctn2k2’¥  ' * • ’k<^nmkm  = ^ 

But  3 has  more  unknowns  than  equations,  so  the  proof  is  complete  since  Theorem  1.2.2  guarantees  the 
existence  of  nontrivial  solutions. 

Let  S'  = jwi , W2, . . .,  wm  j=  be  any  set  of  m vectors  in  V,  where  m <n-  We 
want  to  show  that  S'  does  not  span  V.  We  will  do  this  by  showing  that  the  assumption  that  S'  spans  V leads  to  a 
contradiction  of  the  linear  independence  of  {vj,  \2, v„}  . If  S'  spans  V,  then  every  vector  in  V is  a linear 
combination  of  the  vectors  in  S'.  In  particular,  each  basis  vector  v2  is  a linear  combination  of  the  vectors  in  S', 


say 


vi  =anwi  4-a21w24-  • • • 

v2  = a 12wi  + a 22w2  + • • • + am2vfm 

vM  = <3i„wi +dt2«w2+  • • • +amn wm 

To  obtain  our  contradiction,  we  will  show  that  there  are  scalars  Aq,  k2, ....  kn,  not  all  zero,  such  that 

fcivi  4-&2v2  + ' ' ‘ +^«v„  = 0 (5) 

But  4 and  5 have  the  same  form  as  1 and  2 except  that  m and  n are  interchanged  and  the  w's  and  v's  are 
interchanged.  Thus,  the  computations  that  led  to  3 now  yield 

a\\k\  -k-a\2k2  4-  • ■ • +aiM£„  = 0 
<*21*1  + <*22*2  + • • • + <*2w*n  = 0 

<*ml*l  + <*m2*2  + * ’ ’ + <*tmiM*H  = 0 

This  linear  system  has  more  unknowns  than  equations  and  hence  has  nontrivial  solutions  by  Theorem  1.2.2. 

Assume  that  S = {vj,  v2, . . vr  } is  a linearly  independent  set  of  vectors  in  V, 
and  v is  a vector  in  V outside  of  span (5).  To  show  that  Sr  = |vj,  v2, vr,  v j.  is  a linearly  independent  set, 
we  must  show  that  the  only  scalars  that  satisfy 


*1V1+*2V2  + ‘ ‘ ‘ + krvr  + A>-|-i v = 0 (6) 

are  £q  = &2  = • • • = kr  = = 0.  But  it  must  be  true  that  = 0 for  otherwise  we  could  solve  6 for  v 

as  a linear  combination  of  vj,  v2, ...,  vr,  contradicting  the  assumption  that  v is  outside  of  span  (5).  Thus,  6 
simplifies  to 


*ivj  4-  &2v2  + • ■ • + kr\r  = 0 (7) 

which,  by  the  linear  independence  of  {vi,  v2 vr)  , implies  that 

k\  = k2  = ’ ’ ’ = kr  = 0 

Assume  that  S=  {vj,  v2, vr}  is  a set  of  vectors  in  V,  and  (to  be  specific) 
suppose  that  vr  is  a linear  combination  of  vi,  v2 vr_i,  say 


vr  = civi +c2v2+  • • • +cr_ivr_i  (8) 

We  want  to  show  that  if  vr  is  removed  from  S,  then  the  remaining  set  of  vectors  ( vi , v2, . . } still  spans 
S ; that  is,  we  must  show  that  every  vector  w in  span  (-S')  is  expressible  as  a linear  combination  of 
(vl>  v2, ....  vr_i ) . But  if  w is  in  span  (/ST),  then  w is  expressible  in  the  form 

w = *ivi  +£2V2+  ’ ‘ ' 4=  i vr — i + krvr 


or,  on  substituting  8, 


w = £ivi  +^2v2  + • • ' +^_ivr_i +^(civi +c:2v2  + - ' ' +cr-lv>-l) 
which  expresses  w as  a linear  combination  of  vi , V2 vr_i . 

If  S is  a set  of  vectors  that  spans  V but  is  not  a basis  for  V,  then  S is  a linearly 
dependent  set.  Thus  some  vector  v in  S is  expressible  as  a linear  combination  of  the  other  vectors  in  S.  By  the 
Plus/Minus  Theorem  (4.5.36),  we  can  remove  v from  S,  and  the  resulting  set  S'  will  still  span  V.  If  S'  is  linearly 
independent,  then  S'  is  a basis  for  V,  and  we  are  done.  If  S'  is  linearly  dependent,  then  we  can  remove  some 
appropriate  vector  from  S'  to  produce  a set  S"  that  still  spans  V.  We  can  continue  removing  vectors  in  this  way 
until  we  finally  arrive  at  a set  of  vectors  in  S that  is  linearly  independent  and  spans  V.  This  subset  of  S is  a basis 
for  V. 

Suppose  that  dim(^)  = n.  If  S is  a linearly  independent  set  that  is  not  already  a 
basis  for  V,  then  S fails  to  span  V,  so  there  is  some  vector  v in  V that  is  not  in  span (5) . By  the  Plus/Minus 
Theorem  (4.5.3a),  we  can  insert  v into  S,  and  the  resulting  set  S'  will  still  be  linearly  independent.  If  S'  spans  V, 
then  S'  is  a basis  for  V,  and  we  are  finished.  If  S'  does  not  span  V,  then  we  can  insert  an  appropriate  vector  into 
S’  to  produce  a set  S"  that  is  still  linearly  independent.  We  can  continue  inserting  vectors  in  this  way  until  we 
reach  a set  with  n linearly  independent  vectors  in  V.  This  set  will  be  a basis  for  V by  Theorem  4.5.4. 


Concept  Review 

Dimension 

Relationships  among  the  concepts  of  linear  independence,  basis,  and  dimension 

Skills 

Find  a basis  for  and  the  dimension  of  the  solution  space  of  a homogeneous  linear  system. 

Use  dimension  to  determine  whether  a set  of  vectors  is  a basis  for  a finite-dimensional  vector  space. 
Extend  a linearly  independent  set  to  a basis. 


Exercise  Set  4.5 

In  Exercises  1-6,  find  a basis  for  the  solution  space  of  the  homogeneous  linear  system,  and  find  the 
dimension  of  that  space. 

1.  *1+X2-  *3  = 0 

— 2x\  — *2  + 2x3  = 0 

-xi  + *3  = 0 

Answer: 


Basis:  (1,0,  1);  dimension  = 1 


2.  3;ti  + *2  + *3  + *4  = 0 
5x  i — X2  + X3  — X4  = 0 

3.  xi -4x2 + 3x3-  *4  = 0 
2x  i — 8x2  + 6x3  — 2x4  = 0 

Answer: 

Basis:  (4,  1,  0,  0),  (—3,  0, 1,  0),  (1,  0,  0,  1);  dimension  = 3 

4.  xi -3x2+  *3  = 0 
2xi  — 6*2  + 2x3  = 0 
3xi  — 9x2  + 3*3  = 0 

5.  2xi  +X2  + 3x3  = 0 

xi  +5x3  = 0 
X2  + X3  = 0 

Answer: 

No  basis;  dimension  = 0 

6.  X+  y + Z = 0 
3x  + 2y  — 2z  = 0 
4x  + 3y  — z = 0 
6x  + 5,y  + z = 0 

7.  Find  bases  for  the  following  subspaces  of 

(a)  The  plane  3x  - 2y  + 5z  = 0- 

(b)  The  plane  x — y = 0- 

(c)  The  line  x = 2 t,y=  -t,z  = 4t- 

(d)  All  vectors  of  the  form  (a,  b,  c),  where  b = a \ c- 


Answer: 


(b)  (1,1,0),  (0,0,  1) 

(c)  (2,  -1.4) 

(d)  (1,1,0),  (0,1,1) 


8.  Find  the  dimensions  of  the  following  subspaces  of  £4. 

(a)  All  vectors  of  the  form  (a,  b,c,  0) . 

(b)  All  vectors  of  the  form  (a,  b,  c,  d ) , where  d = a + b and  c — a-b- 

(c)  All  vectors  ofthe  form  ( a , b,  c,  d),  where  a = b = c = d- 

9.  Find  the  dimension  of  each  ofthe  following  vector  spaces. 

(a)  The  vector  space  of  all  diagonal  nxn  matrices. 


(b)  The  vector  space  of  all  symmetric  nxn  matrices. 

(c)  The  vector  space  of  all  upper  triangular  nxn  matrices. 

Answer: 

(a)  n 

(b)  «(«+!) 

2 

(c)  ”(«+!) 

2 

10.  Find  the  dimension  of  the  subspace  of  P3  consisting  of  all  polynomials  an  \ a\x  \ a->xl  ! a yr'  f°r  which 

t20  = 0. 

(a)  Show  that  the  set  W of  all  polynomials  in  P2  such  that  *(1)  = 0 is  a subspace  of  P 3. 

(b)  Make  a conjecture  about  the  dimension  of  W. 

(c)  Confirm  your  conjecture  by  finding  a basis  for  W. 

12.  Find  a standard  basis  vector  for  Pp  that  can  be  added  to  the  set  {vi,  V2}  to  produce  a basis  for 

(a)  vi  = ( - 1,  2,  3),  v2  = (1,  - 2,  - 2) 

(b)  vi  = (1.  -1,0).  v2  = (3,  1,  -2) 

13.  Find  standard  basis  vectors  for  that  can  be  added  to  the  set  (vj , V2}  to  produce  a basis  for 

Vl  = (l.  -4.2.  -3).  v2  = ( — 3,  8,  -4,6) 


Answer: 

Any  two  of  (0,  1,  0,  0),  (0,  0,  1,  0),  and  (0,  0,  0,  1)  can  be  used. 

14.  Let  {v1.v2.v3}  be  a basis  for  a vector  space  V.  Show  that  {u1.u2.u3}  is  also  a basis,  where  ui  =vi, 
u2  = vi  4-  V2,  and  U3  = vi  4-  V2  + V3. 

15.  The  vectors  vi  = (1,  — 2,  3)  and  V2  = (0,  5,  — 3)  are  linearly  independent.  Enlarge  {vi,  V2}  to  a basis 
for*3. 

Answer: 

v3  = (a,  b,  c ) with  Sa  - 3b  - 5c  * 0 

16.  The  vectors  v\  = (1,  — 2,  3,  — 5)  and  V2  = (0,  —1,2,  — 3)  are  linearly  independent.  Enlarge 
{ vi , V2  } to  a basis  for  £4. 

(a)  Show  that  for  every  positive  integer  n , one  can  find  n \ 1 linearly  independent  vectors  in  — oo,  oo) 
. [Hint:  Look  for  polynomials.] 

(b)  Use  the  result  inpart  (a)  to  prove  that  F{  — oo,  oo)  is  infinite-  dimensional. 

(c)  Prove  that  C ( — oo,  oo),  Cm  ( — oo,  oo)?  and  C°°(  — oo,  oo)  are  infinite-dimensional  vector  spaces. 

18.  Let  She  a basis  for  an  ^ -dimensional  vector  space  V.  Show  that  if  vi,  V2, ...,  vr  form  a linearly 
independent  set  of  vectors  in  V,  then  the  coordinate  vectors  (v^)^,  (V2)  ...,  (vr)  ^ form  a linearly 

independent  set  in  Rn,  and  conversely. 


19.  Using  the  notation  from  Exercise  18,  show  that  if  the  vectors  v\,  V2, vr  span  V,  then  the  coordinate 
vectors  (vi)^,  (v2)^, (vr)  $ span  Rn,  and  conversely. 

20.  Find  a basis  for  the  subspace  of  P2  spanned  by  the  given  vectors. 

(a)  -1  +x-2x2,3  + 3x  + 6x2,9 

(b)  1 + x,  x2,  —2  4-  2x2,  —3x 

(c)  1 +x  - 3x2,  2 + 2x-  6x2,  3 + 3x-  9x2 

[Hint:  Let  S be  the  standard  basis  for  P2,  and  work  with  the  coordinate  vectors  relative  to  S as  in  Exercises 
18  and  19.] 

21.  Prove:  A subspace  of  a finite-dimensional  vector  space  is  finite-dimensional. 

22.  State  the  two  parts  of  Theorem  4.5.2  in  contrapositive  form. 

True-False  Exercises 

In  parts  (a)-(j)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  The  zero  vector  space  has  dimension  zero. 

Answer: 

True 

(b)  There  is  a set  of  17  linearly  independent  vectors  in  £ V 
Answer: 

True 

(c)  There  is  a set  of  1 1 vectors  that  span  R * ; . 

Answer: 

False 

(d)  Every  linearly  independent  set  of  five  vectors  in  is  a basis  for  pj' . 

Answer: 

True 

(e)  Every  set  of  five  vectors  that  spans  R-’  is  a basis  for  p-1. 

Answer: 

True 

(f)  Every  set  of  vectors  that  spans  Rn  contains  a basis  for  Rn. 

Answer: 


True 


(g)  Every  linearly  independent  set  of  vectors  in  Rn  is  contained  in  some  basis  for  Rn. 

Answer: 

True 

(h)  There  is  a basis  for  M22  consisting  of  invertible  matrices. 

Answer: 

True 

® If  A has  size  nxn  and  /fi  A A2  A”2  ai'e  distinct  matrices,  then  j In,  A,  A1 , .... 
dependent. 

Answer: 

True 

(j)  There  are  at  least  two  distinct  three-dimensional  subspaces  of  P2. 

Answer: 

False 


J:  is  linearly 
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4.6  Change  of  Basis 

A basis  that  is  suitable  for  one  problem  may  not  be  suitable  for  another,  so  it  is  a common  process  in  the  study 
of  vector  spaces  to  change  from  one  basis  to  another.  Because  a basis  is  the  vector  space  generalization  of  a 
coordinate  system,  changing  bases  is  akin  to  changing  coordinate  axes  in  p}  and  R-'.  In  this  section  we  will 
study  problems  related  to  change  of  basis. 


Coordinate  Maps 

lfS=  {vj,  V2, ....  vM}  is  a basis  for  a finite-dimensional  vector  space  V,  and  if 

(V)<S'=  (cl>  c2 c„) 

is  the  coordinate  vector  of  v relative  to  S , then,  as  observed  in  Section  4.4  , the  mapping 

V— (v)s  (1) 


creates  a connection  (a  one-to-one  correspondence)  between  vectors  in  the  general  vector  space  V and  vectors 
in  the  familiar  vector  space  Rn.  We  call  1 the  coordinate  map  from  V to  Rn.  In  this  section  we  will  find  it 
convenient  to  express  coordinate  vectors  in  the  matrix  form 

c\- 
c2 

: (2) 

cn 

where  the  square  brackets  emphasize  the  matrix  notation  (Figure  4.6.1). 

Coordinate  map 


1 Is 

~C\~ 

V 

c2 

c* 

V R ■ 


Figure  4.6.1 


Change  of  Basis 

There  are  many  applications  in  which  it  is  necessary  to  work  with  more  than  one  coordinate  system.  In  such 
cases  it  becomes  important  to  know  how  the  coordinates  of  a fixed  vector  relative  to  each  coordinate  system 
are  related.  This  leads  to  the  following  problem. 


The  Change-of-Basis  Problem 

If  y is  a vector  in  a finite-dimensional  vector  space  V , and  if  we  change  the  basis  for  V from  a basis  B 
to  a basis  B\  how  are  the  coordinate  vectors  [ v]  £ and  [v] 

L J 

To  solve  this  problem,  it  will  be  convenient  to  refer  to  B as  the  “old  basis”  and  B'  as  the  “new 
basis.”  Thus,  our  objective  is  to  find  a relationship  between  the  old  and  new  coordinates  of  a fixed  vector  v in 
V. 


For  simplicity,  we  will  solve  this  problem  for  two-dimensional  spaces.  The  solution  for  ^-dimensional  spaces 
is  similar.  Let 

B = |uj,  u2 1 and  B'  = {u'j , } 

be  the  old  and  new  bases,  respectively.  We  will  need  the  coordinate  vectors  for  the  new  basis  vectors  relative 
to  the  old  basis.  Suppose  they  are 


and 


That  is. 


Now  let  v be  any  vector  in  V,  and  let 


be  the  new  coordinate  vector,  so  that 


Uj  = cui  ■+  i>U2 
U2  = cuj  + du2 


(3) 

(4) 

(5) 


v = £iUj  +&2U2 


(6) 


In  order  to  find  the  old  coordinates  of  v,  we  must  express  v in  terms  of  the  old  basis  B.  To  do  this,  we 
substitute  4 into  6.  This  yields 

v = £1  (auj  4-  &U2)  + &2(cul  + <^u2) 


or 


v = (k\a  4-  £2c)u1  + + ^2^)u2 

Thus,  the  old  coordinate  vector  for  v is 


[v]B  = 


+ k2C 
k\b-kk2d 


which,  by  using  5,  can  be  written  as 


This  equation  states  that  the  old  coordinate  vector  [v]  £ results  when  we  multiply  the  new  coordinate  vector 
[v]  £•'  on  the  left  by  the  matrix 

P—\a  C 

[b  d_ 

Since  the  columns  of  this  matrix  are  the  coordinates  of  the  new  basis  vectors  relative  to  the  old  basis  [see  3] 
we  have  the  following  solution  of  the  change-of-basis  problem. 


Solution  of  the  Change-of-Basis  Problem 

If  we  change  the  basis  for  a vector  space  V from  an  old  basis  B = {ui,  U2, ....  uM)  to  a new  basis 
B = \ u'i , U2, ....  u„  j>5  then  for  each  vector  v in  V,  the  old  coordinate  vector  [v]  £ is  related  to  the 
new  coordinate  vector  [v]  by  the  equation 

[v]b  = .P[v]F'  (7) 

where  the  columns  of  P are  the  coordinate  vectors  of  the  new  basis  vectors  relative  to  the  old  basis; 
that  is,  the  column  vectors  of  P are 

["1  Is-  Ws Kb  («) 


Transition  Matrices 

The  matrix  P in  Equation  7 is  called  the  transition  matrix  from  B!  to  B.  For  emphasis,  we  will  often  denote  it 
by  Pq'_  .£  It  follows  from  8 that  this  matrix  can  be  expressed  in  terms  of  its  column  vectors  as 

••![<]*]  (9) 

Similarly,  the  transition  matrix  from  B to  B’  can  be  expressed  in  terms  of  its  column  vectors  as 

Pb->b'  = [ [ui]b'|[u2]b'|  ' ' ' |[u«]b'']  (10) 


There  is  a simple  way  to  remember  both  of  these  formulas  using  the  terms  “old  basis”  and  “new 
basis”  defined  earlier  in  this  section:  In  Formula  9 the  old  basis  is  Bf  and  the  new  basis  is  B,  whereas  in 
Formula  10  the  old  basis  is  B and  the  new  basis  is  B1 ■ Thus,  both  formulas  can  be  restated  as  follows: 


The  columns  of  the  transition  matrix  from  an  old  basis  to  a new  basis  are  the  coordinate  vectors  of  the 
old  basis  relative  to  the  new  basis. 


EXAMPLE  1 Finding  Transition  Matrices 

Consider  the  bases  B = {ui,  112}  and  & = \ ui  > u2  } for  £>},  where 

ui  = (1,  0),  u2=(0,l),  uj  = (l,l),  u2  = (2>  1 ) 

Find  the  transition  matrix  Pgr  _g  from  B'  to  B. 

Find  the  transition  matrix  Pg  _g<  from  B to  Br . 


Solution 

Here  the  old  basis  vectors  are  Uj  and  u-,  and  the  new  basis  vectors  are  uj  and  U2.  We  want 
to  find  the  coordinate  matrices  of  the  old  basis  vectors  uj  and  u-,  relative  to  the  new  basis 
vectors  uj  and  112.  To  do  this,  first  we  observe  that 

Uj  = uj  + U2 

U2  = 2uj  + U2 


from  which  it  follows  that 


and  hence  that 


PB'^B  = 


1 2 
1 1 


2 

1 


Here  the  old  basis  vectors  are  uj  and  U2  and  the  new  basis  vectors  are  u|  and  u-, . As  in  part 
(a),  we  want  to  find  the  coordinate  matrices  of  the  old  basis  vectors  uj  and  u-,,  relative  to 
the  new  basis  vectors  W and  U2.  To  do  this,  observe  that 

uj  = — uj  + U2 
u2  = 2uj  - U2 


from  which  it  follows  that 

[«i]b' 


and  [u2]B'  = 


2 

-1 


FB-*B’ 


-1  2 

1 -1 


and  hence  that 


Suppose  now  that  5 and  B'  are  bases  for  a finite-dimensional  vector  space  V.  Since  multiplication  by  Pg>  _g 
maps  coordinate  vectors  relative  to  the  basis  Br  into  coordinate  vectors  relative  to  a basis  B,  and  Pg  ,g>  maps 
coordinate  vectors  relative  to  B into  coordinate  vectors  relative  to  Bf , it  follows  that  for  every  vector  v in  V 
we  have 


[v]_g  = P_g*-^ig[v]ig' 

(11) 

[v]  [v]£ 

(12) 

EXAMPLE  2 Computing  Coordinate  Vectors 


Let  B and  Br  be  the  bases  in  Example  1.  Use  an  appropriate  formula  to  find  [v]  g given  that 


[▼]*'  = 


-3 

5 


To  find  [v]  g we  need  to  make  the  transition  from  B'  to  B.  It  follows  from  Formula 
1 1 and  part  (a)  of  Example  1 that 


[v]B  = Pg»_+B[v]B' 


'1  2' 

'-3' 

'7' 

_1  1_ 

5_ 

_2_ 

Invertibility  of  Transition  Matrices 

If  B and  B!  are  bases  for  a finite-dimensional  vector  space  V.  then 

(Pb'->b)  (P  = ?B^B 

because  multiplication  by  (Pg1  .g)  (Pg  ,g!)  first  maps  5-coordinates  of  a vector  into  ^-coordinates,  and 
then  maps  those  Br -coordinates  back  into  the  original  5-coordinates.  Since  the  net  effect  of  the  two  operations 
is  to  leave  each  coordinate  vector  unchanged,  we  are  led  to  conclude  that  Pg  ,g  must  be  the  identity  matrix, 
that  is, 


(13) 


(we  omit  the  formal  proof).  For  example,  for  the  transition  matrices  obtained  in  Example  1 we  have 


(Pb’->b)  (Pb^b1) 


1 2' 

-1  2" 

"1  0' 

_1  1_ 

1 -1_ 

_°  1_ 

It  follows  from  13  that  is  invertible  and  that  its  inverse  is  Pg  .g*  Thus,  we  have  the  following 

theorem. 


THEOREM  4.6.1 


If  P is  the  transition  matrix  from  a basis  Bf  to  a basis  B for  a finite-dimensional  vector  space  V,  then  P 
is  invertible  and  p~ * is  the  transition  matrix  from  B to 


An  Efficient  Method  for  Computing  Transition  Matrices  for  Rn 

Our  next  objective  is  to  develop  an  efficient  procedure  for  computing  transition  matrices  between  bases  for 
R”.  As  illustrated  in  Example  1,  the  first  step  in  computing  a transition  matrix  is  to  express  each  new  basis 
vector  as  a linear  combination  of  the  old  basis  vectors.  For  R>!  this  involves  solving  n linear  systems  of  n 
equations  in  n unknowns,  each  of  which  has  the  same  coefficient  matrix  (why?).  An  efficient  way  to  do  this  is 
by  the  method  illustrated  in  Example  2 of  Section  1.6,  which  is  as  follows: 

r n 


A Procedure  for  Computing  Pb  B' 

Step  1 Form  the  matrix  \J>! |i?J . 

Step  2 Use  elementary  row  operations  to  reduce  the  matrix  in  Step  1 to  reduced  row  echelon  form. 

Step  3 The  resulting  matrix  will  be 

Step  4 Extract  the  matrix  P £ from  the  right  side  of  the  matrix  in  Step  3. 

J 


This  procedure  is  captured  in  the  following  diagram. 

row  operations 

[ new  b asis  | old  b asis  ] — > [ / [transition  from  old  to  new  ] 


(14) 


EXAMPLE  3 Example  1 Revisited 

In  Example  1 we  considered  the  bases  B = {uj,  112}  and  B'  = juf',  U2^  | for  r},  where 
H = (1.0).  u2=(0,l),  ui'=(l,l),  u2'=(2,1) 

Use  Formula  14  to  find  the  transition  matrix  from  B'  to  B. 

(b)  Use  Formula  14  to  find  the  transition  matrix  from  B to  B' . 


Solution 


Here  B’  is  the  old  basis  and  B is  the  new  basis,  so 

[new  basis | old  basis]  = 


Since  the  left  side  is  already  the  identity  matrix,  no  reduction  is  needed.  We  see  by 
inspection  that  the  transition  matrix  is 

"1  2' 


1 0 

1 2 

0 1 

1 1 

PB,^B  = 


1 1 


1 2 

1 0" 

1 1 

0 1 

which  agrees  with  the  result  in  Example  1 . 

Here  B is  the  old  basis  and  B!  is  the  new  basis,  so 

[new  basis | old  basis]  = 

By  reducing  this  matrix,  so  the  left  side  becomes  the  identity  we  obtain  (verify) 
[/|transition  from  old  to  new]  = 

so  the  transition  matrix  is 

P ,_r-i  2 

1 _1 

which  also  agrees  with  the  result  in  Example  1 . 


"l  0 

-1  2' 

0 1 

1 -1 

Transition  to  the  Standard  Basis  for  Rn 

Note  that  in  part  (a)  of  the  last  example  the  column  vectors  of  the  matrix  that  made  the  transition  from  the 
basis  B’  to  the  standard  basis  turned  out  to  be  the  vectors  in  B‘  written  in  column  form.  This  illustrates  the 
following  general  result. 


THEOREM  4.6.2 

Let  B'  = {ui > u2 um } beany  basis  for  the  vector  space  Rn  and  let  S'  = { e j , e2, . . e„ ) be  the 

standard  basis  for  Rn.  If  the  vectors  in  these  bases  are  written  in  column  form,  then 

P B,—*S=  tul|u2|  ‘ ’ ‘ |um]  (15) 


It  follows  from  this  theorem  that  if 


A=  [ui|u2|  • • • |u„] 


is  any  invertible  nxn  matrix,  then  A can  be  viewed  as  the  transition  matrix  from  the  basis  {uj,  U2, u^} 
for  R”  to  the  standard  basis  for  Rn.  Thus,  for  example,  the  matrix 


A = 


1 2 3 

2 5 3 
1 0 8 


which  was  shown  to  be  invertible  in  Example  4 of  Section  1 .5,  is  the  transition  matrix  from  the  basis 

ui  = (1,2.1).  u2  =(2.5.0).  u3  = (3,3,8) 

to  the  basis 


ei  = (1,0,0),  e2  = (0,1,0),  e3  = (0,0,1) 


Concept  Review 

Coordinate  map 
Change-of-basis  problem 
Transition  matrix 

Skills 

Find  coordinate  vectors  relative  to  a given  basis  directly. 
Find  the  transition  matrix  from  one  basis  to  another. 

Use  the  transition  matrix  to  compute  coordinate  vectors. 


Exercise  Set  4.6 

1.  Find  the  coordinate  vector  for  w relative  to  the  basis  S=  {uj , U2 ) for  R^. 

(a)  ui  = (l,0),  u2  = (0,  1);  w=  (3,  -7) 

(b)  “1  = (2,  -4),  u2  = (3,  8),  w=  (1,  1) 

(c)  U1  = (1,  1).  u2  = (0,  2);  w=  (a,  b ) 


Answer: 


(a) 

(b) 


[w]^  = 


3 

-7 

5_ 

28 

3_ 

14 


Ms= 


(c) 


a 


Ms  = 


b-a 

2 


2.  Find  the  coordinate  vector  for  v relative  to  the  basis  S = (vi,  V2,  V3}  for  p-\ 

(a)  v=  (2,  — 1,  3);  vi  = (1,  0,  0),  v2  = (2,  2,  0),  v3  = (3,  3,  3) 

(b)  v = (5,  — 12,  3);  vi  = (1,  2,  3),  v2  = ( — 4,  5,  6),  v3  = (7,  -8,9) 

3.  Find  the  coordinate  vector  for  p relative  to  the  basis  S=  {p  1 , p2,  P3 ) for  P2- 

(a)  p = 4 — 3x  I pi  = 1,  p2  = x,  p3  = 

(b)  p = 2 — x I x2;  pj  = 1 -|-x,  p2  = 1 4.  xA,  p3  = x + xz 


Answer: 


(a) 


(P )j=(4,  -3,1),  [j>]s= 


(b) 


(v)s=  (0>  2,  -1),  [P ]s= 


4 

-3 

1 

0 

2 

-1 


4.  Find  the  coordinate  vector  for  A relative  to  the  basis  S = {A\,  Aj,  A3,  A4}  for  Af22. 


A = 


2 0 

-1  3 


A3  = 


A = 

'0  O' 

1 0 


-1  1 

0 0 

, A4  = 


^2  = 


1 1 
0 0 


0 0 
0 1 


5.  Consider  the  coordinate  vectors 


Ms= 


1 

DO 

6' 

'3' 

7 

-1 

4 

II 

0 

4 

II 

5 

6 

3 

(a)  Find  w if  S is  the  basis  in  Exercise  2(a). 

(b)  Find  q if  S is  the  basis  in  Exercise  3(a). 

(c)  Find  B if  S is  the  basis  in  Exercise  4. 


Answer: 


(a)  w=  (16,  10,  12) 

(b)  q = 3 + 4x" 


(c) 


B = 


15  -1 
6 3 


6.  Consider  the  bases  B = {ui,  112}  and t’’  — |ui . u?  [=  for  g},  where 


T 

'O' 

f 

'2' 

f 

ui  = 

0 

. u2  = 

1 

, Uj  = 

1 

• u2  = 

-3 

4 


(a)  Find  the  transition  matrix  from  B'  to  B. 

(b)  Find  the  transition  matrix  from  B to  B! . 

(c)  Compute  the  coordinate  vector  [w]  g,  where 


w = 


3 

-5 


and  use  10  to  compute  [w]  g\ 

(d)  Check  your  work  by  computing  [w]  g'  directly. 

7.  Repeat  the  directions  of  Exercise  6 with  the  same  vector  w but  with 

ui  = 


"2' 

4' 

T 

/ 

_2_ 

. «2  = 

, Uj  = 

_3_ 

> u2  = 

Answer: 

(a) 


11 

10 


-4  0 


(b) 


(c) 


0 

-2  - 

Wb  = 


_5 

2 

13 

2 


17 

10 

8 

5 


[w]  s'  = 


-4 

-7 


8.  Consider  the  bases  B = {ui,  112, 113}  and B*  — \ ui  > u2 > u3  ];  for  where 


'-3' 

'-3' 

1 

ui  = 

1 

1 

00  0 

1 

. u2  = 

2 

-1 

- u3  = 

6 

-1 

ui  = 

1 1 

CO  ^ O 

1 I 

1 1 

, vl2  = 

l 1 

CM  CO  ^ 

1 1 

1 1 

, «3  = 

'-2' 

-3 

7 

(a)  Find  the  transition  matrix  from  B to  Br ■ 

(b)  Compute  the  coordinate  vector  [w]  g,  where 


-5 

8 

-5 


w = 


and  use  12  to  compute  [w]  %>. 

(c)  Check  your  work  by  computing  [w]  g>  directly. 

9.  Repeat  the  directions  of  Exercise  8 with  the  same  vector  w,  but  with 


U1  = 

'2' 

1 

, u2 = 

2' 

-1 

, u3  = 

T 

2 

1 

1 

1 

3' 

1 

'-1' 

ui  = 

1 

. u2  = 

1 

■ u3  = 

0 

-5 

-3 

2 

_7 

2 
23 
2 

6 

10.  Consider  the  bases  B = {pi,  P2}  and  B'  = =j  qi,  q2  ]=  for  P\  where 

Pl  = 6 + 3x,  P2  = 10  + 2x,  qi  = 2,  q2  = 3 + 2;c 

(a)  Find  the  transition  matrix  from  Br  to  B. 

(b)  Find  the  transition  matrix  from  B to  B! ■ 

(c)  Compute  the  coordinate  vector  [p]  g,  where  p = —4  f i.  and  use  12  to  compute  [p]  gi. 

(d)  Check  your  work  by  computing  [p  ] g ■ directly. 

11.  Fet  Fbe  the  space  spanned  by  f j = sin  x and  f j = cos  x. 

(a)  Show  that  gj  = 2sin  x + cos  x and  g2  = 3cos  x form  a basis  for  V. 

(b)  Find  the  transition  matrix  from  B!  = J gi , g2  } to  5=  {f  i , f 2 } • 

(c)  Find  the  transition  matrix  from  B to  £?\ 

(d)  Compute  the  coordinate  vector  [h]  g,  where  h = 2sm  ^ — 5cos  x?  and  use  12  to  obtain  [h] 

(e)  Check  your  work  by  computing  [h]  g>  directly. 


Answer: 


(a) 


2 4 


(b) 


-2  -3 

5 1 

[w]B  = 


9 

-9 


- [w]B'  = 


Answer: 


(C) 


_i  1 

6 3 


12.  Let  S be  the  standard  basis  for  p^,  and  let  B = { vi , v2 } be  the  basis  in  which  vi  = (2,  1 ) and 
v2  = ( ~ 3,  4) 

(a)  Find  the  transition  matrix  Pg_>s  by  inspection. 

(b)  Use  Formula  14  to  find  the  transition  matrix  P% 

(c)  Confirm  that  Pb~*S  and  -B  are  inverses  of  one  another. 

(d)  Let  w=  (5,  — 3)  Find  [w]  £ and  then  use  Formula  1 1 to  compute  [w]  £ 

(e)  Let  w=  (3,  — 5)  Find  [w]  £ and  then  use  Formula  12  to  compute  [w]  £ 

13.  Let  S be  the  standard  basis  for  p},  and  let  B=  {vj,V2,  V3}  be  the  basis  in  which  vi  = (1,  2,  1), 
v2  =(2,5,0),  and  V3  = (3,  3,  8) . 

(a)  Find  the  transition  matrix  ?£  ,£  by  inspection. 

(b)  Use  Formula  14  to  find  the  transition  matrix  P£  ,£. 

(c)  Confirm  that  P£  .£  and  P£  .£  are  inverses  of  one  another. 

(d)  Let  w=  (5,  — 3,  1).  Find  [w]  £ and  then  use  Formula  11  to  compute  [w]  £. 

(e)  Let  w=(3,  — 5,  0).  Find  [w]  £ and  then  use  Formula  12  to  compute  [w]  £. 

Answer: 

(a)  12  3 
2 5 3 
1 0 8 

(b)  —40  16  9" 

13  -5  -3 
5 -2  -1 

(d)  r — 2391 

[w]B=  77  , [w]^ 

30 

(e)  3 

[■w]s=  -5  , [w]  £ = 

0 

14.  Let5i=  (ul.u2)  andi?2  = (vi,V2)  be  the  bases  for  in  which 
ui  = (2,  2),  u2  = (4,  -1),  vi  = (1,3),  and  v2  = ( — 1,  -1). 

(a)  Use  Formula  14  to  find  the  transition  matrix  Pb2~*B\ ■ 

(b)  Use  Formula  14  to  find  the  transition  matrix  Pb\->B2- 

(c)  Confirm  that  P £- ^ and  Pb\-*B2  are  inverses  of  one  another. 


(d)  Let  w=  (5,  — 3).  Find  [w]  £^  and  then  use  the  matrix  Pb\-*B2  to  compute  [w]  from  [w]  gj. 

(e)  Let  w=  (3,  — 5).  Find  [w]  £.-,  and  then  use  the  matrix  Pb2~*B\  to  compute  [w]  from  [w]  £.-,. 

15.  Leti?i=  {ui,u2}  andi?2  = {vi,v2}  be  the  bases  for  p}  in  which  ui  = (1,  2),  U2  = (2,  3), 
vi  = (1,  3),  andv2  = (1,  4). 

(a)  Use  Formula  14  to  find  the  transition  matrix  Pb2—*B\ ■ 

(b)  Use  Formula  14  to  find  the  transition  matrix  Pb\-*B2- 

(c)  Confirm  that  p£-,  ,£^  and  PB\  -Bo  are  inverses  of  one  another. 

(d)  Let  w=  (0,  1).  Find  [w]  £^  and  then  use  the  matrix  Pb\-*B2  to  compute  [w]  £>-  from  [w] 

(e)  Let  w=  (2,  5).  Find  [w]  £z,  and  then  use  the  matrix  -Pb2— »i?i  to  compute  [w]  £^  from  [w]  £>2- 


Answer: 


16.  Let5i=  {ui,U2,  U3}  andi?2  = {vi,v2,  V3}  be  the  bases  for  in  which  uj  = ( — 3,  0,  —3), 

U2  = ( - 3,  2,  - 1),  113  = (1,  6,  - 1),  vi  = ( - 6,  - 6,  0),  V2  = ( - 2,  - 6, 4),  and 

v3  = ( — 2,  -3,7). 

(a)  Find  the  transition  matrix  P Bi~+B 2- 

(b)  Let  w = ( — 5,  8,  — 5) . Find  [w]  £^  and  then  use  the  transition  matrix  obtained  in  part  (a)  to 
compute  [w]  £.-,  by  matrix  multiplication. 

(c)  Check  the  result  in  part  (b)  by  computing  [w]  £>2  directly. 

17.  Follow  the  directions  of  Exercise  16  with  the  same  vector  w but  with  ui  = (2,  1,  1),  u2  = (2,  —1,1), 
113  = (1,  2,  1),  vi  = (3, 1,  - 5),  v2  = (1,  1,  - 3),  and  v3  = ( - 1,  0,  2). 

Answer: 


(b) 


_7 

9]  2 

[w]b!=  -9  , [w ]b2=  23 

-5  2 

6 

18.  Let  S = { e i , e2 } be  the  standard  basis  for  p},  and  let  B = { vi , V2 } be  the  basis  that  results  when  the 
vectors  in  S are  reflected  about  the  line  y = x. 

(a)  Find  the  transition  matrix  Pq 

(b)  Let  P = Pq  and  show  that  p?  = pc, 

19.  Let  S = { e i , e2 } be  the  standard  basis  for  p2,  and  let  B = { vi , V2 } be  the  basis  that  results  when  the 
vectors  in  S are  reflected  about  the  line  that  makes  an  angle  9 with  the  positive  x-axis. 

(a)  Find  the  transition  matrix  Pq 

(b)  Let  P = Pq  and  show  that  p?  = p 

Answer: 

(a)  cos  29  sin  29 
sin  29  —cos  29 

20.  If  B\,  82,  and  B 3 are  bases  for  p2,  and  if 

Pb1-+B2=  5 2 and  P*2-*3  = \ 

then  Pb3  -B\  = 

21.  If  P is  the  transition  matrix  from  a basis  B to  a basis  B,  and  Q is  the  transition  matrix  from  B to  a basis  C, 
what  is  the  transition  matrix  from  B'  to  Cl  What  is  the  transition  matrix  from  C to  B'l 

22.  To  write  the  coordinate  vector  for  a vector,  it  is  necessary  to  specify  an  order  for  the  vectors  in  the  basis.  If 
P is  the  transition  matrix  from  a basis  B'  to  a basis  B,  what  is  the  effect  on  P if  we  reverse  the  order  of 
vectors  in  B from  vi , . . v„  to  v„, . . vi  ? What  is  the  effect  on  P if  we  reverse  the  order  of  vectors  in 
both  B'  and  B1 

23.  Consider  the  matrix 

'1  1 O' 

P=  1 0 2 
0 2 1 

(a)  P is  the  transition  matrix  from  what  basis  B to  the  standard  basis  S = { e 1 , e2,  e3 } for  p-'l 

(b)  P is  the  transition  matrix  from  the  standard  basis  S=  {e^,  e2,  e^}  to  what  basis  B for  R-'l 


2 2 1 
5’  5’  5 


24.  The  matrix 


P = 


1 0 0 
0 3 2 
0 1 1 


is  the  transition  matrix  from  what  basis  B to  the  basis  {(1,  1,  1),  (1,  1,0),  (1,0,0))  for  ^7 


25.  Let  B be  a basis  for  Rn . Prove  that  the  vectors  vj , V2, ...,  form  a linearly  independent  set  in  Rn  if  and 
only  if  the  vectors  [ vi  ] g,  [ V2  ] g, . . .,  [v^]  £ form  a linearly  independent  set  in  R”. 

26.  Let  B be  a basis  for  Rn.  Prove  that  the  vectors  vj,  V2, . . Vfc  span  Rn  if  and  only  if  the  vectors 

[V1  ] B’  [^2  \b [vfc]  £ span  Rn . 

27.  If  [w]  £ = w holds  for  all  vectors  w in  Rn,  what  can  you  say  about  the  basis  5? 


True-False  Exercises 


In  parts  (a)-(f)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  If  B\  and  82  are  bases  for  a vector  space  V,  then  there  exists  a transition  matrix  from  B\  to  52- 

Answer: 

True 

(b)  Transition  matrices  are  invertible. 

Answer: 

True 

(c)  If  B is  a basis  for  a vector  space  R ”,  then  P£  .£  is  the  identity  matrix. 

Answer: 

True 

(d)  If  Pbi  — >£2  is  a diagonal  matrix,  then  each  vector  in  B'i  is  a scalar  multiple  of  some  vector  in  B\ . 
Answer: 

True 

(e)  If  each  vector  in  B2  is  a scalar  multiple  of  some  vector  in  B\,  then  P is  a diagonal  matrix. 

Answer: 

False 

(f)  If  A is  a square  matrix,  then  A = Pb\^>B2  f°r  some  bases  B\  and  B'i  for  Rn. 

Answer: 


False 
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4.7  Row  Space,  Column  Space,  and  Null  Space 

In  this  section  we  will  study  some  important  vector  spaces  that  are  associated  with  matrices.  Our  work  here  will  provide 
us  with  a deeper  understanding  of  the  relationships  between  the  solutions  of  a linear  system  and  properties  of  its 
coefficient  matrix. 


Row  Space,  Column  Space,  and  Null  Space 

Recall  that  vectors  can  be  written  in  comma-delimited  form  or  in  matrix  form  as  either  row  vectors  or  column  vectors. 
In  this  section  we  will  use  the  latter  two. 


DEFINITION  1 

For  an  m x « matrix 


an 

«12  • 

--  a\ n 

«21 

S22  • 

- <*2n 

<*m  1 

<*m2  ■ 

the  vectors 


ri 

= 

[<311 

«12  - 

- ai«] 

r2 

= 

[a2\ 

a22  - 

- <*2n] 

= 

[<*m  1 

am2  - 

- tftnn] 

in  Rn  that  are  formed  from  the  rows  of  A are  called  the  row  vectors  of  A,  and  the  vectors 


^ CN 
1 

’^12" 

«22 

a\n 

a2n 

ci  = 

<*m  1 

. c2  = 

am2 

zn  — 

^mn 

in  Rm  formed  from  the  columns  of  A are  called  the  column  vectors  of  A. 


J 


EXAMPLE  1 


Row  and  Column  Vectors  of  a 2 * 3 Matrix 


◄ 


Let 


A = 


2 

3 


1 0 

-1  4 


The  row  vectors  of  A are 

ri  = [2 

and  the  column  vectors  of  A are 

2' 

3 ’ 


1 0 ] and r2  = [ 3 -1  4] 


C2  = 


c3  = 


0 

4 


ci  = 


The  following  definition  defines  three  important  vector  spaces  associated  with  a matrix. 


DEFINITION  2 

If  A is  an  m x n matrix,  then  the  subspace  of  Rn  spanned  by  the  row  vectors  of  A is  called  the  row  space  of  A, 
and  the  subspace  of  R m spanned  by  the  column  vectors  of  A is  called  the  column  space  of  A.  The  solution  space 
of  the  homogeneous  system  of  equations  Ax  = Q?  which  is  a subspace  of  5”,  is  called  the  null  space  of  A. 


J 


In  this  section  and  the  next  we  will  be  concerned  with  two  general  questions: 

Question  1.  What  relationships  exist  among  the  solutions  of  a linear  system  Ax  = b and  the  row  space,  column  space, 
and  null  space  of  the  coefficient  matrix  A? 

Question  2.  What  relationships  exist  among  the  row  space,  column  space,  and  null  space  of  a matrix? 

Starting  with  the  first  question,  suppose  that 


11 

<*12  • 

--  a\n 

■*r 

A = 

*21 

<*22  • 

- <*2n 

and  x = 

*2 

fljal 

<*m2  - 

It  follows  from  Formula  10  of  Section  1.3  that  if  c i , C2, . . c„  denote  the  column  vectors  of  A,  then  the  product  Ax  can 
be  expressed  as  a linear  combination  of  these  vectors  with  coefficients  from  x;  that  is, 

Ax  = ;qci  +*2c2  + — + xmc„  (1) 

Thus,  a linear  system,  Ax  = b>  of  m equations  in  n unknowns  can  be  written  as 

*ici  + *2C2  + ...  + = b (2) 

from  which  we  conclude  that  Ax  = b is  consistent  if  and  only  if  is  expressible  as  a linear  combination  of  the  column 
vectors  of  A.  This  yields  the  following  theorem. 


THEOREM  4.7.1 

A system  of  linear  equations  Ax  = b is  consistent  if  and  only  if  b is  in  the  column  space  of  A. 

EXAMPLE  2 A Vector  b in  the  Column  Space  of  A 


Let  Ax  = b be  the  linear  system 


"-1  3 2' 

"*f 

r 

1 2 -3 

*2 

= 

-9 

2 1 -2 

*3 

-3 

Show  that  b is  in  the  column  space  of  A by  expressing  it  as  a linear  combination  of  the  column  vectors  of 
A. 

Solving  the  system  by  Gaussian  elimination  yields  (verify) 
x\=27  X2=  — 1,  *3  = 3 

It  follows  from  this  and  Formula  2 that 


-i 

3 

2 

1 

1 

2 

2 

1 

+ 3 

l 

1 1 

t\J  oo 

— 

-9 

-3 

Recall  from  Theorem  3.4.4  that  the  general  solution  of  a consistent  linear  system  = b can  be  obtained  by  adding  any 
specific  solution  of  this  system  to  the  general  solution  of  the  corresponding  homogeneous  system  Ax  = 0-  Keeping  in 
mind  that  the  null  space  of  A is  the  same  as  the  solution  space  of  Ax  = 0?  we  can  rephrase  that  theorem  in  the  following 
vector  form. 


THEOREM  4.7.2 

If  xq  is  any  solution  of  a consistent  linear  system  Ax  = b?  and  if  S = { vi , v2, . . } is  a basis  for  the  null 
space  of  A,  then  every  solution  of  Ax  = b can  be  expressed  in  the  form 

x = x0  + civi  +c2V2+...  + CfcVft  (3) 

Conversely,  for  all  choices  of  scalars  c\,  the  vector  x in  this  formula  is  a solution  of  Ax  = b- 


Equation  3 gives  a formula  for  the  general  solution  of  Ax  = b-  The  vector  xq  in  that  formula  is  called  a particular 
solution  of  Ax  = h>  and  the  remaining  part  of  the  formula  is  called  the  general  solution  of  Ax  = Q.  In  words,  this 
formula  tells  us  that. 

The  general  solution  of  a consistent  linear  system  can  be  expressed  as  the  sum  of  a particular  solution  of  that  system 
and  the  general  solution  of  the  corresponding  homogeneous  system. 

Geometrically,  the  solution  set  of  Ax  = b can  be  viewed  as  the  translation  by  xq  of  the  solution  space  of  Ax  = 0 (Figure 
4.7.1). 


Solution  space 
of  dx  = 0 

Figure  4.7.1 


EXAMPLE  3 General  Solution  of  a Linear  System  Ax  = b 

In  the  concluding  subsection  of  Section  3.4  we  compared  solutions  of  the  linear  systems 


*1 

*1 

'l 

3 

-2 

0 

2 

o' 

x2 

'o' 

'i 

3 

-2 

0 

2 

o' 

*2 

o' 

2 

6 

-5 

-2 

4 

-3 

*3 

0 

and 

2 

6 

-5 

-2 

4 

-3 

x3 

-1 

0 

0 

5 

10 

0 

15 

*4 

0 

0 

0 

5 

10 

0 

15 

x4 

5 

2 

6 

0 

8 

4 

18 

x5 

0 

2 

6 

0 

8 

4 

18 

x5 

6 

x6 

x6 

and  deduced  that  the  general  solution  x of  the  nonhomogeneous  system  and  the  general  solution  x^  of  the 
corresponding  homogeneous  system  (when  written  in  column- vector  form)  are  related  by 


Recall  from  the  Remark  following  Example  4 of  Section  4.5  that  the  vectors  in  x^  form  a basis  for  the  solution  space  of 

Ax  = 0- 


Bases  for  Row  Spaces,  Column  Spaces,  and  Null  Spaces 

We  first  developed  elementary  row  operations  for  the  purpose  of  solving  linear  systems,  and  we  know  from  that  work 
that  performing  an  elementary  row  operation  on  an  augmented  matrix  does  not  change  the  solution  set  of  the 
corresponding  linear  system.  It  follows  that  applying  an  elementary  row  operation  to  a matrix  A does  not  change  the 
solution  set  of  the  corresponding  linear  system  Ax  = Q?  °E  stated  another  way,  it  does  not  change  the  null  space  of  A. 
Thus  we  have  the  following  theorem. 


THEOREM  4.7.3 


Elementary  row  operations  do  not  change  the  null  space  of  a matrix. 


The  following  theorem,  whose  proof  is  left  as  an  exercise,  is  a companion  to  Theorem  4.7.3. 


THEOREM  4.7.4 

Elementary  row  operations  do  not  change  the  row  space  of  a matrix. 


Theorems  4.7.3  and  4.7.4  might  tempt  you  into  incorrectly  believing  that  elementary  row  operations  do  not  change  the 
column  space  of  a matrix.  To  see  why  this  is  not  tme,  compare  the  matrices 


A = 


1 3 

2 6 


and 


B = 


1 3 
0 0 


The  matrix  B can  be  obtained  from  A by  adding  -2  times  the  first  row  to  the  second.  However,  this  operation  has 
changed  the  column  space  of  A,  since  that  column  space  consists  of  all  scalar  multiples  of 

T 

2 


whereas  the  column  space  of  B consists  of  all  scalar  multiples  of 


1 

0 


and  the  two  are  different  spaces. 


EXAMPLE  4 Finding  a Basis  for  the  Null  Space  of  a Matrix 

Find  a basis  for  the  null  space  of  the  matrix 

1 3 -2  0 2 O' 

2 6 -5  -2  4 -3 

0 0 5 10  0 15 

2 6 0 8 4 18 


The  null  space  of  A is  the  solution  space  of  the  homogeneous  linear  system  = 0>  which,  as 
shown  in  Example  3,  has  the  basis 


VI  = 


’-3' 

’-4' 

’-2' 

1 

0 

0 

0 

-2 

0 

0 

. v2  = 

1 

, v3  = 

0 

0 

0 

1 

0 

0 

0 

Observe  that  the  basis  vectors  v\,  V2,  and  V3  in  the  last  example  are  the  vectors  that  result  by  successively 
setting  one  of  the  parameters  in  the  general  solution  equal  to  1 and  the  others  equal  to  0. 

The  following  theorem  makes  it  possible  to  find  bases  for  the  row  and  column  spaces  of  a matrix  in  row  echelon  form 
by  inspection. 


THEOREM  4.7.5 

If  a matrix  R is  in  row  echelon  form,  then  the  row  vectors  with  the  leading  1 's  (the  nonzero  row  vectors)  form  a 
basis  for  the  row  space  of  R,  and  the  column  vectors  with  the  leading  1 's  of  the  row  vectors  form  a basis  for  the 
column  space  of  R. 


The  proof  involves  little  more  than  an  analysis  of  the  positions  of  the  0's  and  1 's  of  R.  We  omit  the  details. 


EXAMPLE  5 Bases  for  Row  and  Column  Spaces 

The  matrix 

'1  -2  5 0 3' 

R=  0 1300 

0 0 0 1 0 
0 0 0 0 0 


is  in  row  echelon  form.  From  Theorem  4.7.5,  the  vectors 

rl 

= [1  - 

-2  5 

0 3] 

*2 

= [0  1 

3 0 0] 

r3 

= [0  0 

0 

1 0] 

form  a basis  for  the  row  space  of  R , and  the  vectors 

T 

-2" 

"o' 

0 

1 

0 

Cl  = 

0 

. C2  = 

0 

> c4  — 

1 

0 

0 

0 

form  a basis  for  the  column  space  of  R. 


EXAMPLE  6 Basis  for  a Row  Space  by  Row  Reduction 

Find  a basis  for  the  row  space  of  the  matrix 

' 1 -3  4 -2  5 4' 

,=  2-6  9-1  8 2 

2-6  9-1  9 7 

-1  3-4  2 -5  -4 

Since  elementary  row  operations  do  not  change  the  row  space  of  a matrix,  we  can  find  a basis 
for  the  row  space  of  A by  finding  a basis  for  the  row  space  of  any  row  echelon  form  of  A.  Reducing  A to 
row  echelon  form,  we  obtain  (verify) 

"1  -3  4 -2  5 4' 

„ 0 0 1 3-2-6 

0 0 0 0 1 5 

0 0 0 0 0 0 

By  Theorem  4.7.5,  the  nonzero  row  vectors  of  R form  a basis  for  the  row  space  of  R and  hence  form  a 
basis  for  the  row  space  of  A.  These  basis  vectors  are 

ri  = [1  -3  4 -2  5 4] 

r2  = [0  0 1 3-2  -6] 

r3  = [0  0 0 0 1 5] 


The  problem  of  finding  a basis  for  the  column  space  of  a matrix  A in  Example  6 is  complicated  by  the  fact  that  an 
elementary  row  operation  can  alter  its  column  space.  However,  the  good  news  is  that  elementary  row  operations  do  not 
alter  dependence  relationships  among  the  column  vectors.  To  make  this  more  precise,  suppose  that  wi , w2, . . .,  are 
linearly  dependent  column  vectors  of  A,  so  there  are  scalars  cj,  c2, c%  that  are  not  all  zero  and  such  that 

cjwi  +c2w2  -K..  + c/fWfc  = 0 (4) 

If  we  perform  an  elementary  row  operation  on  A,  then  these  vectors  will  be  changed  into  new  column  vectors 
w[ , W2 , . . . At  first  glance  it  would  seem  possible  that  the  transformed  vectors  might  be  linearly  independent. 
However,  this  is  not  so,  since  it  can  be  proved  that  these  new  column  vectors  will  be  linear  dependent  and,  in  fact, 
related  by  an  equation 

cpwi  + ...H-Cfcwk  = 0 

that  has  exactly  the  same  coefficients  as  4.  It  follows  from  the  fact  that  elementary  row  operations  are  reversible  that 
they  also  preserve  linear  independence  among  column  vectors  (why?).  The  following  theorem  summarizes  all  of  these 
results. 


THEOREM  4.7.6 

If  A and  B are  row  equivalent  matrices,  then: 

(a)  A given  set  of  column  vectors  of  A is  linearly  independent  if  and  only  if  the  corresponding  column  vectors 
of  B are  linearly  independent. 


(b)  A given  set  of  column  vectors  of  A forms  a basis  for  the  column  space  of  A if  and  only  if  the  corresponding 
column  vectors  of  B form  a basis  for  the  column  space  of  B. 


EXAMPLE  7 Basis  for  a Column  Space  by  Row  Reduction 

Find  a basis  for  the  column  space  of  the  matrix 

1 —3  4-2  5 4" 

2-69-182 
2-69-197 
-1  3 —4  2-5-4 

We  observed  in  Example  6 that  the  matrix 

"l  —3  4 —2  5 4" 

0 0 1 3-2-6 

0 0 0 0 1 5 

0 0 0 0 0 0 

is  a row  echelon  form  of  A.  Keeping  in  mind  that  A and  R can  have  different  column  spaces,  we  cannot 
find  a basis  for  the  column  space  of  A directly  from  the  column  vectors  of  R.  However,  it  follows  from 
Theorem  4.7 .6b  that  if  we  can  find  a set  of  column  vectors  of  R that  forms  a basis  for  the  column  space  of 
R , then  the  corresponding  column  vectors  of  A will  form  a basis  for  the  column  space  of  A. 

Since  the  first,  third,  and  fifth  columns  of  R contain  the  leading  1 's  of  the  row  vectors,  the  vectors 


T 

V 

5" 

0 

J 

i 

J 

-2 

0 

’ c3  “ 

0 

’ c5  ~ 

1 

0 

0 

0 

form  a basis  for  the  column  space  of  R.  Thus,  the  corresponding  column  vectors  of  A,  which  are 


f 

4~ 

5' 

2 

9 

8 

2 

. c3  = 

9 

> c5  = 

9 

-1 

-4 

-5 

form  a basis  for  the  column  space  of  A. 


Up  to  now  we  have  focused  on  methods  for  finding  bases  associated  with  matrices.  Those  methods  can  readily  be 
adapted  to  the  more  general  problem  of  finding  a basis  for  the  space  spanned  by  a set  of  vectors  in  Rn. 

EXAMPLE  8 Basis  for  a Vector  Space  Using  Row  Operations 

Find  a basis  for  the  subspace  of  R-*  spanned  by  the  vectors 

V1  = (1.  -2, 0,0,3),  v2  = (2,  -5,  -3.  -2,6), 

v3  = (0,5,15,10,0),  v4  = (2,6,18,8,6) 


The  space  spanned  by  these  vectors  is  the  row  space  of  the  matrix 

1 -2  0 0 3~ 

2 -5  -3  -2  6 

0 5 15  10  0 

2 6 18  8 6 

Reducing  this  matrix  to  row  echelon  form,  we  obtain 

'1  -2  0 0 3' 

0 13  2 0 

0 0 110 
0 0 0 0 0 

The  nonzero  row  vectors  in  this  matrix  are 

wq  = (l,  -2,  0,0,3),  w2  = (0, 1,3,  2,0),  w3  = (0,  0,  1,  1,  0) 

These  vectors  form  a basis  for  the  row  space  and  consequently  form  a basis  for  the  subspace  of  ^ 
spanned  by  vj,  v3,  v3,  and  V4. 


Bases  Formed  from  Row  and  Column  Vectors  of  a Matrix 


In  all  of  the  examples  we  have  considered  thus  far  we  have  looked  for  bases  in  which  no  restrictions  were  imposed  on 
the  individual  vectors  in  the  basis.  We  now  want  to  focus  on  the  problem  of  finding  a basis  for  the  row  space  of  a matrix 
A consisting  entirely  of  row  vectors  from  A and  a basis  for  the  column  space  of  A consisting  entirely  of  column  vectors 
of  A. 


Looking  back  on  our  earlier  work,  we  see  that  the  procedure  followed  in  Example  7 did,  in  fact,  produce  a basis  for  the 
column  space  of  A consisting  of  column  vectors  of  A,  whereas  the  procedure  used  in  Example  6 produced  a basis  for  the 
row  space  of  A,  but  that  basis  did  not  consist  of  row  vectors  of  A.  The  following  example  shows  how  to  adapt  the 
procedure  from  Example  7 to  find  a basis  for  the  row  space  of  a matrix  that  is  formed  from  its  row  vectors. 


EXAMPLE  9 Basis  for  the  Row  Space  of  a Matrix 

Find  a basis  for  the  row  space  of 

'1  -2  0 0 3" 


2 6 18  86 


consisting  entirely  of  row  vectors  from  A. 


We  will  transpose  A,  thereby  converting  the  row  space  of  A into  the  column  space  of  A ^ then 
we  will  use  the  method  of  Example  7 to  find  a basis  for  the  column  space  of  and  then  we  will 
transpose  again  to  convert  column  vectors  back  to  row  vectors.  Transposing  A yields 


AT  = 


2 0 2 
-5  5 6 

-3  15  18 
8 
6 


-2  10 

6 0 


Reducing  this  matrix  to  row  echelon  form  yields 

1 2 


0 2 
0 1 -5  -10 

0 0 0 1 

0 0 0 0 

0 0 0 0 

The  first,  second,  and  fourth  columns  contain  the  leading  1 's,  so  the  corresponding  column  vectors  in 
form  a basis  for  the  column  space  of  A ^ these  are 


ci  = 


Transposing  again  and  adjusting  the  notation  appropriately  yields  the  basis  vectors 

ri  = [l  -2  0 0 3],  r2=  [2  -5  -3  -2  6], 

and 

r4=  [2  6 18  8 6] 

for  the  row  space  of  A. 


f 

2" 

’ 2' 

-2 

-5 

6 

0 

. c2  = 

-3 

§ 

a* 

n 

II 

18 

0 

-2 

8 

3 

6 

6 

Next,  we  will  give  an  example  that  adapts  the  methods  we  have  developed  above  to  solve  the  following  general 
problem  in  Rn: 

r n 


PROBLEM 

Given  a set  of  vectors  S’  = (vj,  v2, ....  v^}  in  R”,  find  a subset  of  these  vectors  that  forms  a basis  for  span  (S), 
and  express  those  vectors  that  are  not  in  that  basis  as  a linear  combination  of  the  basis  vectors. 

J 


EXAMPLE  10  Basis  and  Linear  Combinations 

Find  a subset  of  the  vectors 

vi  = (1,  - 2,  0,  3),  v2  = (2,  -5,  -3,  6), 
v3=(0.  1,3,0),  v4=(2,  -1,4,  -7),  v5=(5,  -8,1,2) 

that  forms  a basis  for  the  space  spanned  by  these  vectors. 

Express  each  vector  not  in  the  basis  as  a linear  combination  of  the  basis  vectors. 


Solution 


We  begin  by  constructing  a matrix  that  has  vi , V2, 

1 2 0 
-2  -5  1 
0-3  3 
3 6 0 


vj  as  its  column  vectors: 

2 5' 

-1  -8 
4 1 

-7  2 


T T T T T 

VI  V2  V3  V4  V5 


(5) 


The  first  part  of  our  problem  can  be  solved  by  finding  a basis  for  the  column  space  of  this  matrix. 
Reducing  the  matrix  to  reduced  row  echelon  form  and  denoting  the  column  vectors  of  the  resulting 
matrix  by  wj , W2,  W3,  W4,  and  wj  yields 


1 0 2 0 1 

01-101 
0 0 0 1 1 

0 0 0 0 0 

T r T T T 

wi  W2  W3  W4 


(6) 


The  leading  l's  occur  in  columns  1,  2,  and  4,  so  by  Theorem  4.7.5, 

{w1.w2.w4} 

is  a basis  for  the  column  space  of  6,  and  consequently, 

{v1.v2.v4} 

is  a basis  for  the  column  space  of  5. 


We  will  start  by  expressing  W3  and  as  linear  combinations  of  the  basis  vectors  wq , wq?,  W4.  The 
simplest  way  of  doing  this  is  to  express  W3  and  in  terms  of  basis  vectors  with  smaller  subscripts. 
Accordingly,  we  will  express  W3  as  a linear  combination  of  and  W2,  and  we  will  express  as  a 
linear  combination  of  wq,  W2,  and  W4.  By  inspection  of  6,  these  linear  combinations  are 

W3  = 2wi  “W2 
W5  = wi  4-  W2  4-  W4 


We  call  these  the  dependency  equations.  The  corresponding  relationships  in  5 are 

V3  = 2vi  - V2 
V5  = vi  + V2  + V4 


The  following  is  a summary  of  the  steps  that  we  followed  in  our  last  example  to  solve  the  problem  posed  above. 

Basis  for  Span(S) 

Step  1.  Form  the  matrix  A having  vectors  in  S = { vi , V2, . . v^ } as  column  vectors. 

Step  2.  Reduce  the  matrix  A to  reduced  row  echelon  form  R. 

Step  3.  Denote  the  column  vectors  of  R by  wq , W2, . . w^. 

Step  4.  Identify  the  columns  of  R that  contain  the  leading  1 's.  The  corresponding  column  vectors  of  A form  a basis  for 
span  (5). 

This  completes  the  first  part  of  the  problem. 

Step  5.  Obtain  a set  of  dependency  equations  by  expressing  each  column  vector  of  R that  does  not  contain  a leading  1 
as  a linear  combination  of  preceding  column  vectors  that  do  contain  leading  1 's. 


Step  6.  Replace  the  column  vectors  of  R that  appear  in  the  dependency  equations  by  the  corresponding  column  vectors 
of  A. 

This  completes  the  second  part  of  the  problem. 


Concept  Review 

• Row  vectors 
Column  vectors 
Row  space 
Column  space 
Null  space 
General  solution 
Particular  solution 

Relationships  among  linear  systems  and  row  spaces,  column  spaces,  and  null  spaces 
Relationships  among  the  row  space,  column  space,  and  null  space  of  a matrix 
Dependency  equations 

Skills 

Determine  whether  a given  vector  is  in  the  column  space  of  a matrix;  if  it  is,  express  it  as  a linear 
combination  of  the  column  vectors  of  the  matrix. 

Find  a basis  for  the  null  space  of  a matrix. 

Find  a basis  for  the  row  space  of  a matrix. 

Find  a basis  for  the  column  space  of  a matrix. 

Find  a basis  for  the  span  of  a set  of  vectors  in  Rn. 


Exercise  Set  4.7 

1.  List  the  row  vectors  and  column  vectors  of  the  matrix 

'2-10  1 

3 5 7 -1 

14  2 7 

Answer: 

ri  = (2,  -1,0,1),  r2  = (3,  5,  7,  - 1),  r3=  (1.4.  2,7); 


Cl  = 

CM  OO 

i 

. c2  = 

■-r 

5 

. c3  = 

1 

,°  ^ 

, C4  = 

f 

-1 

1 

4 

2 

7 

2.  Express  the  product  as  a linear  combination  of  the  column  vectors  of  A. 


3.  Determine  whether  b is  in  the  column  space  of  A,  and  if  so,  express  b as  a linear  combination  of  the  column  vectors 
of  A. 


(b)  r i i 2i  r-r 

A=  10  1;  b = 0 

2 1 3j  [2 


(b)  b is  not  in  the  column  space  of  A. 


4.  Suppose  that  x\  = — 1 , *2  = 2,  *3  = 4,  *4  = — 3 is  a solution  of  a nonhomogeneous  linear  system  j^x.  = b and  that 
the  solution  set  of  the  homogeneous  system  = 0 is  given  by  the  formulas 

x\  = —3r  + 4s,  X2  = r — s,  *3  = /%  *4  = s 

(a)  Find  a vector  form  of  the  general  solution  of  J}x  — 0. 

(b)  Find  a vector  form  of  the  general  solution  of  — b- 

5.  In  parts  (a)-(d),  find  the  vector  form  of  the  general  solution  of  the  given  linear  system  = b;  then  use  that  result  to 
find  the  vector  form  of  the  general  solution  of  Ax.  = 0- 

(a)  xi -3*2=1 

2xi  — 6x2  = 2 

(b)  xi+X2  + 2x3=  5 

xi  + X3=  — 2 

2xj  +X2  + 3x3  = 3 

(c)  xi— 2x2+  X3  + 2x4=— 1 
2xi  — 4x2  + 2x3  + 4x4  = — 2 

— xi + 2x2—  X3  — 2x4  = 1 

3xi  — 6x2  + 3x3  + 6x4  = — 3 

(d)  xi +2x2  — 3x3+  X4  = 4 

— 2xi+  X2  + 2X3+  X4  = — 1 

— xi + 3x2—  X3  + 2x4=  3 

4xi  — 7x2  — 5x4  = — 5 


Answer: 


(b)  [— 2]  r-11  r-r 

7 +t  -1  ; t -1 

°J  lj  [ 1 


(d)  [6]  [l]  l]  [l]  1 

5 5 5 5 5 

7 4 3 4 3 

^ +s  ^ +^  5,^5  + £ ^ 

0 1 0 1 0 

oj  [oj  1 L°J  1 

6.  Find  a basis  for  the  null  space  of  A. 

(a)  [1-1  3- 

A=  5 -4  -4 

7-6  2 

(b)  [2  0 -1' 

j4=  4 0 — 2 

0 0 0 


(c)  1 4 5 2" 
A=  2 13  0 

-13  2 2 

(d)  f 1 4 5 


2 3 5 

(e)  [ 1 -3  2 2 l" 

0360-3 
A=  2-3-2  4 4 

3 -6  0 6 5 

-2  9 2 -4  -5 

7.  In  each  part,  a matrix  in  row  echelon  form  is  given.  By  inspection,  find  bases  for  the  row  and  column  spaces  of  A. 

(a)  102 

0 0 1 

0 0 0 

(b)  [1  -3  0 O' 

0 10  0 
0 0 0 0 

0 0 0 0 

(c)  [ 1 2 4 5' 

0 1-3  0 

0 0 1-3 

0 0 0 1 
0 0 0 0 

(d)  f 1 2 -1  5' 

0 14  3 

0 0 1-7 

0 0 0 1 

Answer: 

(a)  [l 

rl  = [1  0 2],  r2  = [0  0 1],  cj=  0 , i 

0 

(b) 

ri  = [l  -3  0 0],  r2  = [0  100],  cj  = 


(c)  n = [1  2 4 5],  r2=  [0  1 -30],  r3=  [00  1 - 3],  r4=  [0  0 0 1] , 


(d)  ri  = [1  2 — 1 5],  r2  = [0  14  3],  r3=  [0  0 1 -7],  r4=  [0  0 0 1] 


Y 

2 

'-l' 

5' 

0 

1 

4 

3 

0 

. c2  = 

0 

, C3  = 

1 

> C4  = 

-7 

0 

0 

0 

1 

8.  For  the  matrices  in  Exercise  6,  find  a basis  for  the  row  space  of  A by  reducing  the  matrix  to  row  echelon  form. 

9.  By  inspection,  find  a basis  for  the  row  space  and  a basis  for  the  column  space  of  each  matrix. 


(a) 

"1 

0 2' 

0 

0 1 

0 

0 0 

(b) 

1 

-3 

0 

o' 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

(c) 

1 

2 

4 

5 

0 

1 - 

3 

0 

0 

0 

1 

-3 

0 

0 

0 

1 

0 

0 

0 

0 

(d) 

'l 

2 - 

1 

5 

0 

1 ■ 

4 

3 

0 

0 

1 

-7 

0 

0 

0 

1 

Answer: 


(a) 

'l' 

~2 

ri  = [ 1 0 2];  r2  = [0  0 1 ];  cj  = 

0 

; c2  = 

1 

0 

0 

(b) 

1 

-3 

ri  = [1  -3  0 0];  r2  = [0  1 0 0];  ci  = 

0 

0 

; c2  = 

1 

0 

0 

0 

(c)  ri  = [1  2 4 5];  r2=  [0  1 -3  0],  r3  = 

[0 

0 1 ■ 

-3]; 

Y 

'2' 

4 

5' 

0 

1 

-3 

0 

0 

; c2  = 

0 

; c3  = 

1 

; c4  = 

-3 

0 

0 

0 

1 

0 

0 

0 

0 

(d)ri  = [l  2 -1  5 ] ; r2  = [ 0 1 4 3];r3=[0  0 1 -7]; 


r4  = [0  0 0 1 ];  ci  = 


T 

'2' 

'-f 

5' 

0 

l 

4 

3 

0 

; c2  = 

0 

; c3  = 

1 

; c4  = 

-7 

0 

0 

0 

1 

10.  For  the  matrices  in  Exercise  6,  find  a basis  for  the  row  space  of  A consisting  entirely  of  row  vectors  of  A. 

11.  Find  a basis  for  the  subspace  of  spanned  by  the  given  vectors. 

(a)  (1.1.  -4,  -3),  (2,  0,2,  -2),  (2,  -1,3,2) 

(b)  (-1.1.  -2,0),  (3,  3,  6,0),  (9,  0,0,  3) 

(c)  (1.  1.  0,  0),  (0,  0,  1,  1),  ( - 2,  0,  2,  2),  (0,  - 3,  0,  3) 


Answer: 

(a)  (1,1,  -4-3),  (0,1,  -5,  -2),  Jo,  0,  1, 

(b)  (1,  -1,2,0),  (0,  1,0,  0),  Jo.  0.1.  -•!) 

(c)  (1,  1,  0,  0),  (0,  1,  1,  1),  (0,  0,  1,  1),  (0,  0,  0,  1) 

12.  Find  a subset  of  the  vectors  that  forms  a basis  for  the  space  spanned  by  the  vectors;  then  express  each  vector  that  is 
not  in  the  basis  as  a linear  combination  of  the  basis  vectors. 

(a)  V1  = (1.0.  1.1).  v2=(-3.3.7.1).  v3  = ( - 1,  3,  9,  3),  v4=(-5,3,5,  -1) 

(b)  vi  = (l,  -2,0,3),  v2  = (2,  -4,0,6),  v3  = (-l,  1,2,0),  v4=(0,  -1,2,3) 

(c)  vi  = (l,  -1,5,2),  v2  = (-2,3,  1,0),  v3  = (4,  -5,9,4),  v4=(0.4.2.  -3).  v5  = (-7.  18.  2.  -8) 

13.  Prove  that  the  row  vectors  of  an  n x n invertible  matrix  A form  a basis  for  Rn. 

14.  Construct  a matrix  whose  null  space  consists  of  all  linear  combinations  of  the  vectors 


f 

2' 

vi  = 

-1 

3 

and  V2  = 

0 

-2 

2 

4 

(a)  Let 


A = 


0 1 0 
1 0 0 
0 0 0 


Show  that  relative  to  an  xyz-coordinate  system  in  3 -space  the  null  space  of  A consists  of  all  points  on  the  z-axis 
and  that  the  column  space  consists  of  all  points  in  the  xy-plane  (see  the  accompanying  figure). 

(b)  Find  a 3 x 3 matrix  whose  null  space  is  the  x-axis  and  whose  column  space  is  theyz-plane. 


Null  space  of  A 


x 


Column  space 
of/1 


y 


Figure  Ex-15 


Answer: 


(b) 


0 0 0 
0 1 0 
0 0 1 


16.  Find  a 3 x 3 matrix  whose  null  space  is 

(a)  a point. 

(b)  a line. 

(c)  a plane. 

(a)  Find  all  2 x 2 matrices  whose  null  space  is  the  line  3x  — 5y  = 0 . 

(b)  Sketch  the  null  spaces  of  the  following  matrices: 


A = 

1 4 
_°  5_ 

, B = 

1 0 
o 5y 

C = 

'6  2' 
3 1_ 

, D = 

i 1 

o o 

o o 
1 1 

Answer: 


(a) 


3a 

3b 


-5a 

-5b 


for  all  real  numbers  a , b not  both  0. 


(b)  Since  A and  B are  invertible,  their  null  spaces  are  the  origin.  The  null  space  of  C is  the  line  3x  | y = 0-  The  null 
space  of  D is  the  entire  xy-plane. 


18.  The  equation  x\  + xj  + *3  = 1 can  be  viewed  as  a linear  system  of  one  equation  in  three  unknowns.  Express  its 
general  solution  as  a particular  solution  plus  the  general  solution  of  the  corresponding  homogeneous  system. 
[Suggestion:  Write  the  vectors  in  column  form.] 

19.  Suppose  that  A and  B are  ^ x n matrices  and  A is  invertible.Invent  and  prove  a theorem  that  describes  how  the  row 
spaces  of  AB  and  B are  related. 

True-False  Exercises 


In  parts  (a)-(j)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  The  span  of  v\, \n  is  the  column  space  of  the  matrix  whose  column  vectors  are  v\, vM. 

Answer: 

True 

(b)  The  column  space  of  a matrix  A is  the  set  of  solutions  of  Ax  = b- 
Answer: 

False 

(c)  If  R is  the  reduced  row  echelon  form  of  A,  then  those  column  vectors  of  R that  contain  the  leading  l's  form  a basis  for 
the  column  space  of  A. 


Answer: 


False 

(d)  The  set  of  nonzero  row  vectors  of  a matrix  A is  a basis  for  the  row  space  of  A. 

Answer: 

False 

(e)  If  A and  B are  n x n matrices  that  have  the  same  row  space,  then  A and  B have  the  same  column  space. 

Answer: 

False 

(f)  If  E is  an  m x m elementary  matrix  and  A is  an  m x n matrix,  then  the  null  space  of  E A is  the  same  as  the  null  space 
of  ,4. 

Answer: 

True 

(g)  If  E is  an  m x m elementary  matrix  and  A is  an  m x n matrix,  then  the  row  space  of  E A is  the  same  as  the  row  space 
of  A. 

Answer: 

True 

(h)  If  E is  an  m x m elementary  matrix  and  A is  an  m x n matrix,  then  the  column  space  of  E A is  the  same  as  the  column 
space  of  ,4. 

Answer: 

False 

(i)  The  system  Ax  = b is  inconsistent  if  and  only  if  b is  not  in  the  column  space  of  A. 

Answer: 

True 

(j)  There  is  an  invertible  matrix  A and  a singular  matrix  B such  that  the  row  spaces  of  A and  B are  the  same. 

Answer: 

False 
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4.8  Rank,  Nullity,  and  the  Fundamental  Matrix 
Spaces 

In  the  last  section  we  investigated  relationships  between  a system  of  linear  equations  and  the  row  space,  column 
space,  and  null  space  of  its  coefficient  matrix.  In  this  section  we  will  be  concerned  with  the  dimensions  of  those 
spaces.  The  results  weobtain  will  provide  a deeper  insight  into  the  relationship  between  a linear  system  and  its 
coefficient  matrix. 


Row  and  Column  Spaces  Have  Equal  Dimensions 

In  Examples  6 and  7 of  Section  4.7  we  found  that  the  row  and  column  spaces  of  the  matrix 

' 1 -3  4 -2  5 4" 

= 2—6  9—1  8 2 

2-69-197 
-1  3-4  2-5-4 

both  have  three  basis  vectors  and  hence  are  both  three-dimensional.  The  fact  that  these  spaces  have  the  same 
dimension  is  not  accidental,  but  rather  a consequence  of  the  following  theorem. 


THEOREM  4.8.1 

The  row  space  and  column  space  of  a matrix  A have  the  same  dimension. 


Let  R be  any  row  echelon  form  of  A.  It  follows  from  Theorem  4.7.4  and  Theorem  4.7.6  b that 

dim  (row  space  of  A)  = dim  (row  space  of  R ) 
dim(column  space  of  A)  = dim(column  space  of  R ) 

so  it  suffices  to  show  that  the  row  and  column  spaces  of  R have  the  same  dimension.  But  the  dimension  of  the  row 
space  of  R is  the  number  of  nonzero  rows,  and  by  Theorem  4.7.5  the  dimension  of  the  column  space  of  R is  the 
number  of  leading  l's.  Since  these  two  numbers  are  the  same,  the  row  and  column  space  have  the  same  dimension. 


Rank  and  Nullity 

The  dimensions  of  the  row  space,  column  space,  and  null  space  of  a matrix  are  such  important  numbers  that  there  is 
some  notation  and  terminology  associated  with  them. 


DEFINITION  1 


The  common  dimension  of  the  row  space  and  column  space  of  a matrix  A is  called  the  rank  of  A and  is 
denoted  by  rank(^);  the  dimension  of  the  null  space  of  A is  called  the  nullity  of  A and  is  denoted  by 
nullity  (A). 


J 


The  proof  of  Theorem  4.8.1  shows  that  the  rank 
of  A can  be  interpreted  as  the  number  of  leading 
1 's  in  any  row  echelon  form  of  A. 


EXAMPLE  1 Rank  and  Nullity  of  a 4 x 6 Matrix 


Find  the  rank  and  nullity  of  the  matrix 

"-1  20  4 5-3 

3 -7  2 0 1 4 

2 -5  2 4 6 1 

4-9  2-4-4  7 


The  reduced  row  echelon  form  of  A is 


1 0 -4  -28  -37  13 

0 1 -2  -12  -16  5 

0 0 0 0 0 0 

0 0 0 0 0 0 


(1) 


(verify).  Since  this  matrix  has  two  leading  l's,  its  row  and  column  spaces  are  two-dimensional  and 

rank  (^4)  = 2.  To  find  the  nullity  of  A,  we  must  find  the  dimension  of  the  solution  space  of  the  linear 

system  Ax  = 0-  This  system  can  be  solved  by  reducing  its  augmented  matrix  to  reduced  row  echelon 
form.  The  resulting  matrix  will  be  identical  to  1,  except  that  it  will  have  an  additional  last  column  of 
zeros,  and  hence  the  corresponding  system  of  equations  will  be 

x\  — 4x3  “ 28x4  “ 37x5  + 13x6  = 0 

X2  — 2x3  — 12x4  — 16x5  + 5x6  = 0 

Solving  these  equations  for  the  leading  variables  yields 


*1  = 4x3 

+ 

28x4  + 37  X5  — 

13x6 

*2  = 2x3  + 

from  which  we  obtain  the  general  solution 

12x4  + 16*5  — 

5x6 

*i 

= 

4 r + 28s  + 37 1 —13  u 

*2 

= 

2r  + 1 2s  + 1 6t  — 5u 

*3 

= 

r 

x4 

= 

s 

= 

t 

*6 

= 

u 

or  in  column  vector  form 


’*f 

4 

28 

37 

-13 

*2 

2 

12 

16 

-5 

*3 

1 

+ s 

0 

+ t 

0 

+ u 

0 

*4 

= r 

0 

1 

0 

0 

*5 

0 

0 

1 

0 

x6 

0 

0 

0 

1 

Because  the  four  vectors  on  the  right  side  of  3 form  a basis  for  the  solution  space,  nullity(^)  = 4. 


(3) 


EXAMPLE  2 Maximum  Value  for  Rank 

What  is  the  maximum  possible  rank  of  an  m x n matrix  A that  is  not  square? 

Since  the  row  vectors  of  A lie  in  R*  and  the  column  vectors  in  Rm,  the  row  space  of  A is 
at  most  ^-dimensional  and  the  column  space  is  at  most  m-dimensional.  Since  the  rank  of  A is  the 
common  dimension  of  its  row  and  column  space,  it  follows  that  the  rank  is  at  most  the  smaller  of  m 
and  n.  We  denote  this  by  writing 

rank(^4)  < min(ra,  n) 

in  which  min  ( mf  n)  is  the  minimum  of  m and  n. 


The  following  theorem  establishes  an  important  relationship  between  the  rank  and  nullity  of  a matrix. 


Dimension  Theorem  for  Matrices 

If  A is  a matrix  with  n columns,  then 


rank  (-4)  + nullity  (A)  = n 


(4) 


Since  A has  n columns,  the  homogeneous  linear  system  Ax  = 0 has  n unknowns  (variables).  These  fall  into 
two  distinct  categories:  the  leading  variables  and  the  free  variables.  Thus, 

number  of  leading 
variables 

But  the  number  of  leading  variables  is  the  same  as  the  number  of  leading  l's  in  the  reduced  row  echelon  form  of  .4, 
which  is  the  rank  of  A ; and  the  number  of  free  variables  is  the  same  as  the  number  of  parameters  in  the  general 
solution  of  Ax  = 0?  which  is  the  nullity  of  A.  This  yields  Formula  4. 


number  of  free 
variables 


= n 


EXAMPLE  3 The  Sum  of  Rank  and  Nullity 

The  matrix 

'-1  2 0 4 5 —3 

3 -7  2 0 1 4 

2-52461 
4—9  2—4  —4  7 

has  6 columns,  so 

rank  (^4)  4 nullity  (-4)  = 6 

This  is  consistent  with  Example  1,  where  we  showed  that 

rank  (^4)  = 2 and  nullity  (^4)  = 4 


The  following  theorem,  which  summarizes  results  already  obtained,  interprets  rank  and  nullity  in  the  context  of  a 
homogeneous  linear  system. 


THEOREM  4.8.3 

If  A is  an  m x n matrix,  then 

(a)  rank  (.4)  = the  number  of  leading  variables  in  the  general  solution  of  A x = 0. 

(ty  nullity  (2d)  = the  number  of  parameters  in  the  general  solution  of  A x = 0 

EXAMPLE  4 Number  of  Parameters  in  a General  Solution 

Find  the  number  of  parameters  in  the  general  solution  of  Ax  = 0 if- A is  a 5 x 7 matrix  of  rank  3. 
From  4, 

nuUity(j4)  = n — rank(-d)  =7  — 3 = 4 

Thus  there  are  four  parameters. 


Equivalence  Theorem 

In  Theorem  2.3.8  we  listed  seven  results  that  are  equivalent  to  the  invertibility  of  a square  matrix  A.  We  are  now  in 
a position  to  add  eight  more  results  to  that  list  to  produce  a single  theorem  that  summarizes  most  of  the  topics  we 
have  covered  thus  far. 


THEOREM  4.8.4  Equivalent  Statements 


If  A is  an  n x n matrix,  then  the  following  statements  are  equivalent. 

(a)  A is  invertible. 

(b)  Ax  = 0 has  only  the  trivial  solution. 

(c)  The  reduced  row  echelon  form  of  A is  ln. 

(d)  A is  expressible  as  a product  of  elementary  matrices. 

(e)  Ax  = b is  consistent  for  every  n x 1 matrix  b- 

(f)  Ax  = b has  exactly  one  solution  for  every  ^ x 1 matrix  b- 

(g)  det(^)  * 0. 

(h)  The  column  vectors  of  A are  linearly  independent. 

(i)  The  row  vectors  of  A are  linearly  independent. 

(j)  The  column  vectors  of  A span  Rn. 

(k)  The  row  vectors  of  A span  Rn. 

(l)  The  column  vectors  of  A form  a basis  for  Rn. 

(m)  The  row  vectors  of  A form  a basis  for  R n. 

(n)  A has  rank  n. 

(o)  A has  nullity  0. 

The  equivalence  of  {h)  through  (m)  follows  from  Theorem  4.5.4  (we  omit  the  details).  To  complete  the 
proof  we  will  show  that  (6),  («),  and  ( o ) are  equivalent  by  proving  the  chain  of  implications 
(b)  =>  (p)  =>  (»)  =>  (b). 

(°)  = 0 has  only  the  trivial  solution,  then  there  are  no  parameters  in  that  solution,  so  nullity  (^4)  = 0 

by  Theorem  4.8.3  b. 

ip)  =>  (»)  Theorem  4.8.2. 

(«)=»-(*)  If  A has  rank  n , then  Theorem  4.8.3a  implies  that  there  are  n leading  variables  (hence  no  free  variables) 
in  the  general  solution  of  = 0-  This  leaves  the  trivial  solution  as  the  only  possibility. 


Overdetermined  and  Underdetermined  Systems 

In  many  applications  the  equations  in  a linear  system  correspond  to  physical  constraints  or  conditions  that  must  be 
satisfied.  In  general,  the  most  desirable  systems  are  those  that  have  the  same  number  of  constraints  as  unknowns, 
since  such  systems  often  have  a unique  solution.  Unfortunately,  it  is  not  always  possible  to  match  the  number  of 
constraints  and  unknowns,  so  researchers  are  often  faced  with  linear  systems  that  have  more  constraints  than 
unknowns,  called  overdetermined  systems , or  with  fewer  constraints  than  unknowns,  called  underdetermined 
systems.  The  following  two  theorems  will  help  us  to  analyze  both  overdetermined  and  underdetermined  systems. 


In  engineering  and  other  applications,  the 
occurrence  of  an  overdetermined  or 
underdetermined  linear  system  often  signals  that 
one  or  more  variables  were  omitted  in  formulating 
the  problem  or  that  extraneous  variables  were 
included.  This  often  leads  to  some  kind  of 
undesirable  physical  result. 


THEOREM  4.8.5 

Ifi4x  = b is  a consistent  linear  system  of  m equations  in  n unknowns,  and  if  A has  rank  r,  then  the  general 
solution  of  the  system  contains  n — r parameters. 


It  follows  from  Theorem  4.7.2  that  the  number  of  parameters  is  equal  to  the  nullity  of  A,  which,  by 
Theorem  4.8.2,  is  n _ r. 


THEOREM  4.8.6 

Let dbeanwx«  matrix. 

(a)  (Overdetermined  Case)  If  m>n,  then  the  linear  system  Ax  = b is  inconsistent  for  at  least  one  vector 
bin/?". 

(b)  (Under determined  Case)  If  m<n->  then  for  each  vector  b in  Rm  the  linear  system  Ax  = b is  either 
inconsistent  or  has  infinitely  many  solutions. 


Assume  that  m , in  which  case  the  column  vectors  of  A cannot  span  Rm  (fewer  vectors  than  the 
dimension  of  Rm).  Thus,  there  is  at  least  one  vector  b in  Rm  that  is  not  in  the  column  space  of  A,  and  for  that  b the 
system  Ax  = b is  inconsistent  by  Theorem  4.7.1. 

Assume  that  m For  each  vector  b in  /?"  there  are  two  possibilities:  either  the  system  Ax  = b is 
consistent  or  it  is  inconsistent.  If  it  is  inconsistent,  then  the  proof  is  complete.  If  it  is  consistent,  then  Theorem  4.8.5 
implies  that  the  general  solution  has  n — r parameters,  where  r = rank(A) . But  rank  (A)  is  the  smaller  of  m and  n, 
so 


» — r = n^m>0 

This  means  that  the  general  solution  has  at  least  one  parameter  and  hence  there  are  infinitely  many  solutions. 


EXAMPLE  5 Overdetermined  and  Underdetermined  Systems 


What  can  you  say  about  the  solutions  of  an  overdetermined  system  Ax  = b of  7 equations  in  5 
unknowns  in  which  A has  rank  — 4? 

What  can  you  say  about  the  solutions  of  an  underdetermined  system  Ax  = b of  5 equations  in  7 
unknowns  in  which  A has  rank  r = 4? 

Solution 

The  system  is  consistent  for  some  vector  b in  R and  for  any  such  b the  number  of  parameters  in 
the  general  solution  is^_^=5— 4 = 1 • 

(b)  The  system  may  be  consistent  or  inconsistent,  but  if  it  is  consistent  for  the  vector  b in  then  the 
general  solution  has  ^ — ^ = 7— 4 = 3 parameters. 

EXAMPLE  6 An  Overdetermined  System 

The  linear  system 


*1 

— 

2x2 

= 

*1 

— 

x2 

= h 

*1 

+ 

*2 

= h 

*1 

+ 

2x2 

= 64 

*1 

+ 

3*2 

= h 

is  overdetermined,  so  it  cannot  be  consistent  for  all  possible  values  of  b\,  62,  ^3?  and  63.  Exact 
conditions  under  which  the  system  is  consistent  can  be  obtained  by  solving  the  linear  system  by  Gauss- 
Jordan  elimination.  We  leave  it  for  you  to  show  that  the  augmented  matrix  is  row  equivalent  to 


1 

0 

2*2 

— 

b l 

0 

1 

h 

— 

b\ 

0 

0 

h - 

- 3*2 

+ 

2*i 

0 

0 

b4  - 

- 4*2 

+ 

3*i 

0 

0 

h - 

- 5*2 

+ 

4*i 

Thus,  the  system  is  consistent  if  and  only  if  6 62  > and  63  satisfy  the  conditions 

2 b\  — 3&2  4s  63  = 0 

36  1 — 46  2 +64  =0 

46 1 — 562  + 65  = 0 

Solving  this  homogeneous  linear  system  yields 

6 1 = 5r  — 4s,  62  = 4r  — 3s,  63  = 2r  — s,  64  = r,  b$  = s 

where  r and  s are  arbitrary. 

The  coefficient  matrix  for  the  linear  system  in  the  last  example  has  n = 2 columns,  and  it  has  rank  p = 
because  there  are  two  nonzero  rows  in  its  reduced  row  echelon  form.  This  implies  that  when  the  system  is 
consistent  its  general  solution  will  contain  n-r=Q  parameters;  that  is,  the  solution  will  be  unique.  With  a 
moment's  thought,  you  should  be  able  to  see  that  this  is  so  from  5. 


The  Fundamental  Spaces  of  a Matrix 

There  are  six  important  vector  spaces  associated  with  a matrix  A and  its  transpose  A 

T 

row  space  of  A row  space  of  A 

T 

column  space  of  A column  space  of  A 

T 

null  space  of  A null  space  of  A 

However,  transposing  a matrix  converts  row  vectors  into  column  vectors  and  conversely,  so  except  for  a difference 
in  notation,  the  row  space  of  ^ ^ is  the  same  as  the  column  space  of  A,  and  the  column  space  of  A J is  the  same  as 
the  row  space  of  ,4.  Thus,  of  the  six  spaces  listed  above,  only  the  following  four  are  distinct: 

row  space  of  A column  space  of  A 

T 

null  space  of  A null  space  of  A 

If  A is  an  ^ x n matrix,  then  the  row  space  and 
null  space  of  A are  subspaces  of/?”,  and  the 
column  space  of  A and  the  null  space  of  A J are 
subspaces  of  Rm. 

These  are  called  the  fundamental  spaces  of  a matrix  A.  We  will  conclude  this  section  by  discussing  how  these  four 
subspaces  are  related. 

Let  us  focus  for  a moment  on  the  matrix  A Since  the  row  space  and  column  space  of  a matrix  have  the  same 
dimension,  and  since  transposing  a matrix  converts  its  columns  to  rows  and  its  rows  to  columns,  the  following 
result  should  not  be  surprising. 


THEOREM  4.8.7 

If  A is  any  matrix,  then  rank  ^4  J = rank  ^4  ^ J. 


Proof 


rank  ^4  J = dim  (row  space  of  ^4)  = dim  (column  space  = ranked3  J 


This  result  has  some  important  implications.  For  example,  if  A is  an  m x n matrix,  then  applying  Formula  4 to  the 
matrix  A T and  using  the  fact  that  this  matrix  has  m columns  yields 

rank^4^J  + nullity  = m 


which,  by  virtue  of  Theorem  4.8.7,  can  be  rewritten  as 


rank  4-  nullity  = m (6) 

This  alternative  form  of  Formula  4 in  Theorem  4.8.2  makes  it  possible  to  express  the  dimensions  of  all  four 
fundamental  spaces  in  terms  of  the  size  and  rank  of  A.  Specifically,  if  rank(^4)  = r,  then 

dim[row(^4)]  =r  dim[col(^4)]  =r 

dim  [null (^4)]  =n  — r dimj^null^^Jj  = m — r ^ 

The  four  formulas  in  7 provide  an  algebraic  relationship  between  the  size  of  a matrix  and  the  dimensions  of  its 
fundamental  spaces.  Our  next  objective  is  to  find  a geometric  relationship  between  the  fundamental  spaces 
themselves.  For  this  purpose  recall  from  Theorem  3.4.3  that  if  A is  an  ^ x n matrix,  then  the  null  space  of  A 
consists  of  those  vectors  that  are  orthogonal  to  each  of  the  row  vectors  of  A.  To  develop  that  idea  in  more  detail,  we 
make  the  following  definition. 


DEFINITION  2 

If  IF  is  a subspace  of/?”,  then  the  set  of  all  vectors  in  /?”  that  are  orthogonal  to  every  vector  in  W is  called 
the  orthogonal  complement  of  W and  is  denoted  by  the  symbol  W 1 • 


J 


The  following  theorem  lists  three  basic  properties  of  orthogonal  complements.  We  will  omit  the  formal  proof 
because  a more  general  version  of  this  theorem  will  be  given  later  in  the  text. 


THEOREM  4.8.8 

If  IF  is  a subspace  of/?”,  then: 

(a)  W 1 is  a subspace  of  /?”. 

(b)  The  only  vector  common  to  W and  W 1 is  0. 

(c)  The  orthogonal  complement  of  W 1 is  W. 


EXAMPLE  7 Orthogonal  Complements 

In  g^  the  orthogonal  complement  of  a line  W through  the  origin  is  the  line  through  the  origin  that  is 
perpendicular  to  W (Figure  4.8.1a);  and  in  g-  the  orthogonal  complement  of  a plane  W through  the 
origin  is  the  line  through  the  origin  that  is  perpendicular  to  that  plane  (Figure  4.8.16). 


X 


(«) 

Figure  4.8.1 


(b) 


Explain  why  {0}  and  Rn  are  orthogonal 
complements. 


A Geometric  Link  Between  the  Fundamental  Spaces 

The  following  theorem  provides  a geometric  link  between  the  fundamental  spaces  of  a matrix.  Part  (a)  is  essentially 
a restatement  of  Theorem  3.4.3  in  the  language  of  orthogonal  complements,  and  part  (6),  whose  proof  is  left  as  an 
exercise,  follows  from  part  (a).  The  essential  idea  of  the  theorem  is  illustrated  in  Figure  4.8.2. 


THEOREM  4.8.9 

If  A is  an  m x n matrix,  then: 

(a)  The  null  space  of  A and  the  row  space  of  A are  orthogonal  complements  in  Rn. 

(b)  The  null  space  of  A ^ and  the  column  space  of  A are  orthogonal  complements  in  R™. 


More  on  the  Equivalence  Theorem 


As  our  final  result  in  this  section,  we  will  add  two  more  statements  to  Theorem  4.8.4.  We  leave  the  proof  that  those 
statements  are  equivalent  to  the  rest  as  an  exercise. 


Equivalent  Statements 

If  A is  an  n x n matrix,  then  the  following  statements  are  equivalent. 

(a)  A is  invertible. 

(b)  Ax  = 0 has  only  the  trivial  solution. 

(c)  The  reduced  row  echelon  form  of  A is  ln. 

(d)  A is  expressible  as  a product  of  elementary  matrices. 

(e)  Ax.  = b is  consistent  for  every  ^ x 1 matrix  b- 

(f)  Ax  = b has  exactly  one  solution  for  every  « x 1 matrix  b- 

(g)  det(^)  * 0. 

(h)  The  column  vectors  of  A are  linearly  independent. 

(i)  The  row  vectors  of  A are  linearly  independent. 

(j)  The  column  vectors  of  A span  Rn. 

(k)  The  row  vectors  of  A span  Rn. 

(l)  The  column  vectors  of  A form  a basis  for  R n. 

(m)  The  row  vectors  of  A form  a basis  for  Rn. 

(n)  A has  rank  n- 

(o)  A has  nullity  0. 

(p)  The  orthogonal  complement  of  the  null  space  of  A is  R n. 

(q)  The  orthogonal  complement  of  the  row  space  of  ,4  is  {0}  . 


Applications  of  Rank 

The  advent  of  the  Internet  has  stimulated  research  on  finding  efficient  methods  for  transmitting  large  amounts  of 
digital  data  over  communications  lines  with  limited  bandwidths.  Digital  data  are  commonly  stored  in  matrix  form, 
and  many  techniques  for  improving  transmission  speed  use  the  rank  of  a matrix  in  some  way.  Rank  plays  a role 
because  it  measures  the  “redundancy”  in  a matrix  in  the  sense  that  if  A is  an  ^ x n matrix  of  rank  k , then  ^ — k of 
the  column  vectors  and  w — £ of  the  row  vectors  can  be  expressed  in  terms  of  k linearly  independent  column  or 
row  vectors.  The  essential  idea  in  many  data  compression  schemes  is  to  approximate  the  original  data  set  by  a data 
set  with  smaller  rank  that  conveys  nearly  the  same  information,  then  eliminate  redundant  vectors  in  the 
approximating  set  to  speed  up  the  transmission  time. 


Concept  Review 

Rank 

Nullity 

Dimension  Theorem 
Overdetermined  system 
Underdetermined  system 
Fundamental  spaces  of  a matrix 
Relationships  among  the  fundamental  spaces 
Orthogonal  complement 

Equivalent  characterizations  of  invertible  matrices 

Skills 

Find  the  rank  and  nullity  of  a matrix. 

Find  the  dimension  of  the  row  space  of  a matrix. 


Exercise  Set  4.8 

Verify  that  rank  (a  J = rank  (a  1 J. 


A = 


12  4 0 
-3152 
-2  3 9 2 


Answer: 

Rank(^)=Rank(J4r)  = 2 

2.  Find  the  rank  and  nullity  of  the  matrix;  then  verify  that  the  values  obtained  satisfy  Formula  4 in  the  Dimension 
Theorem. 


(a) 

1 -1 

3 

A = 

5 -4  - 

4 

7 -6 

2 

(b) 

"2  0 -1" 

A = 

4 0-2 

l 

O 

o 

o 

1 

(c) 

14  5 

2' 

A = 

2 1 3 

0 

-13  2 

2 

(d) 

1 4 

5 

2 3 5 7 8 


(e)  [ 1 -3  2 2 1" 

0360-3 
A=  2-3-2  4 4 

3 -6  0 6 5 

—2  9 2 -4  -5 

3.  In  each  part  of  Exercise  2,  use  the  results  obtained  to  find  the  number  of  leading  variables  and  the  number  of 
parameters  in  the  solution  of  Ax  = 0 without  solving  the  system. 

Answer: 

(a)  2;  1 

(b)  1;  2 

(c)  2;  2 

(d)  2;  3 

(e)  3;  2 

4.  In  each  part,  use  the  information  in  the  table  to  find  the  dimension  of  the  row  space  of  A,  column  space  of  A, 
null  space  of  A,  and  null  space  ofyJJ . 


(a) 

(b) 

(c) 

(d) 

(e) 

(f) 

(g) 

Size  of  A 

3x3 

3x3 

3x3 

5x9 

9x5 

4x4 

6x2 

Rank(^) 

3 

2 

1 

2 

2 

0 

2 

5.  In  each  part,  find  the  largest  possible  value  for  the  rank  of  A and  the  smallest  possible  value  for  the  nullity  of  A. 

(a)  A is 4x4 

(b)  A is  3 x 5 

(c)  A is  5 x 3 

Answer: 

(a)  Rank  = 4,  nullity  = 0 

(b)  Rank  = 3,  nullity  = 2 

(c)  Rank  = 3,  nullity  = 0 

6.  If  A is  an  m x n matrix,  what  is  the  largest  possible  value  for  its  rank  and  the  smallest  possible  value  for  its 
nullity? 

7.  In  each  part,  use  the  information  in  the  table  to  determine  whether  the  linear  system  Ax.  = b is  consistent.  If  so, 
state  the  number  of  parameters  in  its  general  solution. 

| (a)  I (b)  I (c)  I (d)  I (e)  I (f)  I (g) 


Size  of  A 3x3  3x3  3x3  5x9  5x9  4x4  6x2 

Rank  (A)  3 2 1 2 2 0 2 

Rank  [A  |b]  3 3 1 2 3 0 2 


Answer: 


(a)  Yes,  0 

(b)  No 

(c)  Yes,  2 

(d)  Yes,  7 

(e)  No 

(f)  Yes,  4 

(g)  Yes,  0 

8.  For  each  of  the  matrices  in  Exercise  7,  find  the  nullity  of  A,  and  determine  the  number  of  parameters  in  the 
general  solution  of  the  homogeneous  linear  system  Ax  = 0- 

9.  What  conditions  must  be  satisfied  by  b j,  £>2,  ar|d  for  the  overdetermined  linear  system 

x\  — 3x2  = b[ 
xi  - 2x2  = ^2 

*1  +*2  = 63 

*1  -4x2  =^4 
xi  + 5x2  =^:5 

to  be  consistent? 


Answer: 


b \ = r,  b2  = s,  63  = 4s  — 3r,  b^  = 2r- 

— s,  b$- 

= 8s- 

-7  r 

10.  Let 

A = 

r*" 

*12 

*13' 

[*21 

*22 

*23 

Show  that  A has  rank  2 if  and  only  if  one  or  more 

of  the  determinants 

an 

*12 

*n 

*13 

*12 

*13 

<321 

*22  ’ 

*21 

*23 

’ *22 

*23 

is  nonzero. 

11.  Suppose  that  A is  a 3 x 3 matrix  whose  null  space  is  a line  through  the  origin  in  3-space.  Can  the  row  or  column 
space  of  A also  be  a line  through  the  origin?  Explain. 

Answer: 

No 

12.  Discuss  how  the  rank  of  A varies  with  t. 

(a)  [11/" 

A=  1 t 1 

t 1 1 

(b)  t 3-1" 

A=  3 6-2 

-1  -3  t 


13.  Are  there  values  of  r and  s for  which 

'10  0 

0 r-2  2 

0 s—  1 r + 2 

0 0 3 

has  rank  1?  Has  rank  2?  If  so,  find  those  values. 

Answer: 

Rank  is  2 if  r = 2 and  § = \ ; the  rank  is  never  1 . 

14.  Use  the  result  in  Exercise  10  to  show  that  the  set  of  points  (*,  y?  z)  in  R 3 for  which  the  matrix 

y z~ 

1 x y 

has  rank  1 is  the  curve  with  parametric  equations  x = t>  y = £"•>  z = P- 

15.  Prove:  If  ^ 0,  then  A and  kA  have  the  same  rank. 

(a)  Give  an  example  of  a 3 x 3 matrix  whose  column  space  is  a plane  through  the  origin  in  3-space. 

(b)  What  kind  of  geometric  object  is  the  null  space  of  your  matrix? 

(c)  What  kind  of  geometric  object  is  the  row  space  of  your  matrix? 

(a)  If  A is  a 3 x 5 matrix,  then  the  number  of  leading  l's  in  the  reduced  row  echelon  form  of  A is  at  most 

. Why? 

(b)  If  A is  a 3 x 5 matrix,  then  the  number  of  parameters  in  the  general  solution  of  Ax  = 0 is  at  most 

. Why? 

(c)  If  A is  a 5 x 3 matrix,  then  the  number  of  leading  1 's  in  the  reduced  row  echelon  form  of  A is  at  most 

. Why? 

(d)  If  A is  a 5 x 3 matrix,  then  the  number  of  parameters  in  the  general  solution  of  Ax  = 0 is  at  most 

. Why? 

Answer: 

(a)  3 

(b)  5 

(c)  3 

(d)  3 


-*-*•  (a)  If  A is  a 3 x 5 matrix,  then  the  rank  of  A is  at  most . Why? 

(b)  If  A is  a 3 x 5 matrix,  then  the  nullity  of  A is  at  most . Why? 

(c)  If  A is  a 3 x 5 matrix,  then  the  rank  of  A T is  at  most . Why? 

(d)  If  A is  a 3 x 5 matrix,  then  the  nullity  of  A ^ is  at  most . Why? 


19-  Find  matrices  A and  B for  which  rank(^)  = rank  (5) , but  rank  (a*  ji  * rank  ^5 2 


Answer: 


20.  Prove:  If  a matrix  A is  not  square,  then  either  the  row  vectors  or  the  column  vectors  of  A are  linearly  dependent. 

True-False  Exercises 

In  parts  (a)-(j)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  Either  the  row  vectors  or  the  column  vectors  of  a square  matrix  are  linearly  independent. 

Answer: 

False 

(b)  A matrix  with  linearly  independent  row  vectors  and  linearly  independent  column  vectors  is  square. 

Answer: 

True 

(c)  The  nullity  of  a nonzero  mxn  matrix  is  at  most  m. 

Answer: 

False 

(d)  Adding  one  additional  column  to  a matrix  increases  its  rank  by  one. 

Answer: 

False 

(e)  The  nullity  of  a square  matrix  with  linearly  dependent  rows  is  at  least  one. 

Answer: 

True 

(f)  If  A is  square  and  Ax  = b is  inconsistent  for  some  vector  b,  then  the  nullity  of  A is  zero. 

Answer: 

False 

(g)  If  a matrix  A has  more  rows  than  columns,  then  the  dimension  of  the  row  space  is  greater  than  the  dimension  of 
the  column  space. 

Answer: 

False 

(b)  if  rank  (-4 1 J = rank  then  A is  square. 

Answer: 

False 

(i)  There  is  no  3 x 3 matrix  whose  row  space  and  null  space  are  both  lines  in  3 -space. 


Answer: 


True 

a)  if  Vis  a.  subspace  of  R n and  W is  a subspace  of  V,  then  W x is  a subspace  of  V 
Answer: 

False 
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4-9  Matrix  Transformations  from  Rn  to  Rm 

In  this  section  we  will  study  functions  of  the  form  w=F(x),  where  the  independent  variable  x is  a vector  in  Rn  and  the 
dependent  variable  w is  a vector  in  Rm.  We  will  concentrate  on  a special  class  of  such  functions  called  “matrix 
transformations.”  Such  transformations  are  fundamental  in  the  study  of  linear  algebra  and  have  important  applications 
in  physics,  engineering,  social  sciences,  and  various  branches  of  mathematics. 


Functions  and  Transformations 

Recall  that  a function  is  a rule  that  associates  with  each  element  of  a set  A one  and  only  one  element  in  a set  B.  If  f 
associates  the  element  b with  the  element  a , then  we  write 

b = f(a) 

and  we  say  that  b is  the  image  of  a under / or  that  / (a)  is  the  value  of f at  a.  The  set  A is  called  the  domain  of f and  the 
set  B the  codomain  of f (Figure  4.9.1).  The  subset  of  the  codomain  that  consists  of  all  images  of  points  in  the  domain  is 
called  the  range  of f 


a 


b=f{a) 


Domain  Codomain 

A H 

Figure  4.9.1 

For  many  common  functions  the  domain  and  codomain  are  sets  of  real  numbers,  but  in  this  text  we  will  be  concerned 
with  functions  for  which  the  domain  and  codomain  are  vector  spaces. 


DEFINITION  1 

If  V and  W are  vector  spaces,  and  if/is  a function  with  domain  V and  codomain  W,  then  we  say  that / is  a 
transformation  from  F to  IF  or  that / maps  V to  IF,  which  we  denote  by  writing 

f\V—>W 

In  the  special  case  where  V — W?  the  transformation  is  also  called  an  operator  on  F. 


L J 

In  this  section  we  will  be  concerned  exclusively  with  transformations  from  R}}  to  Rm;  transformations  of  general  vector 
spaces  will  be  considered  in  a later  section.  To  illustrate  one  way  in  which  such  transformations  can  arise,  suppose  that 
/ 1>  f 2>  •-->  f m are  real-valued  functions  of  n variables,  say 


M>!  = 

f l(*l,*2. 

XM) 

w2  = 

/ 2(*1,*2. 

to 

= 

/m(*l,*2> 

XM) 

These  m equations  assign  a unique  point  (wj,  w2, .. 

,%) 

to  each  point  (x i , X2, . 

xn)  in  Rn  and  thus  define  a 

transformation  from  R”  to  Rm.  If  we  denote  this  transformation  by  T,  then  T Rn  — ► Rm  and 

T(x x2, x„)  = (wlr  w2 


Matrix  Transformations 


In  the  special  case  where  the  equations  in  1 

are 

linear,  they  can  be  expressed  in  the  form 

W\ 

= 

aii^i 

+ 

<212*2 

+ 

• • ■ 

+ 

a\  n*n 

W2 

= 

<*21*1 

+ 

<222*2 

4= 

... 

+ 

a2  nxn 

= 

<2*1 

1*1 

+ 

am2*2 

+ 

. . . 

+ 

amnxn 

which  we  can  write  in  matrix  notation  as 

W1  ' 

"<211 

<212  ' 

• • 

a\n 

"*r 

W2 

= 

<221 

<222  ' 

■ ■ 

a2  n 

*2 

Wm 

<2ml 

<2m2  • 

. . 

amn 

*n 

or  more  briefly  as 


w = Ax 


(2) 


(3) 


(4) 


Although  we  could  view  this  as  a linear  system,  we  will  view  it  instead  as  a transformation  that  maps  the  column  vector 
x in  Rn  into  the  column  vector  w in  Rm  by  multiplying  x on  the  left  by  A.  We  call  this  a matrix  transformation  (or 
matrix  operator  if  m = # ),  and  we  denote  it  by  Tj{.  Rn  — »•  Rm . With  this  notation,  Equation  4 can  be  expressed  as 

w=Ta(x)  (5) 


The  matrix  transformation  T j\  is  called  multiplication  by  A,  and  the  matrix  A is  called  the  standard  matrix  for  the 
transformation. 


We  will  also  find  it  convenient,  on  occasion,  to  express  5 in  the  schematic  form 


(6) 


which  is  read  maps  x into  w 

EXAMPLE  1 A Matrix  Transformation  from  R4  to  R3 

The  matrix  transformation  T.R^  ^ R?  defined  by  the  equations 

w\  = 2*i  — 3*2  + *3  — 5*4 

W2  = 4xi  +*2  “ 2^3  + *4 
= 5xi  — X2  4-  4x3 

can  be  expressed  in  matrix  form  as 


(7) 


so  the  standard  matrix  for  T is 


(8) 


wi 

W2 

W3 


*1 

'2 

-3 

1 

-5 

*2 

= 

4 

1 

-2 

1 

*3 

5 

-1 

4 

0 

*4 

A = 


2 

4 

5 


-3 

1 

-1 


1 

-2 

4 


-5 

1 

0 


The  image  of  a point  (jcj,  *2,  *3>  *4)  can  be  computed  directly  from  the  defining  equations  7 or  from  8 
by  matrix  multiplication.  For  example,  if 

(*1,  X2,  X2,  *4)  = (1,  - 3,  0,  2) 

then  substituting  in  7 yields  wj  = 1,  w>2  = 3,  W3  = 8 (verify),  or  alternatively  from  8, 


f 

wi 

'2 

-3 

1 

-5' 

1 

—3 

W2 

= 

4 

1 

-2 

1 

0 

= 

3 

W3 

5 

-1 

4 

0 

8 

2 

Some  Notational  Matters 

Sometimes  we  will  want  to  denote  a matrix  transformation  without  giving  a name  to  the  matrix  itself.  In  such  cases  we 
will  denote  the  standard  matrix  for  T.  Rn  — ► R™  by  the  symbol  [ T] . Thus,  the  equation 

T(x)=[T]x  (9) 

is  simply  the  statement  that  T is  a matrix  transformation  with  standard  matrix  [ T] , and  the  image  of  x under  this 
transformation  is  the  product  of  the  matrix  [ T]  and  the  column  vector  x. 


Properties  of  Matrix  Transformations 

The  following  theorem  lists  four  basic  properties  of  matrix  transformations  that  follow  from  properties  of  matrix 
multiplication. 


THEOREM  4.9.1 

For  every  matrix  A the  matrix  transformation  T^.  Rn  — ► Rm  has  the  following  properties  for  all  vectors  u and  v 
in  Rn  and  for  every  scalar  k\ 

(a)  0)  = 0 

(b)  ?A(M)  = [Homogeneity  property] 

(c)  Ta(v  + v)  = Ta(u)  + Ta(v)  [Additivity  property] 


(d)  ta(v  - v)  = Ta( u)  - Ta(v) 


All  four  parts  are  restatements  of  familiar  properties  of  matrix  multiplication: 

AO  = 0,  j4(£u)  = £(^4u),  A(u  4-  v)  = ^4u  4-  Av,  ^4(u  — v)=Au  — Av 


It  follows  from  Theorem  4.9. 1 that  a matrix  transformation  maps  linear  combinations  of  vectors  in  Rn  into  the 
corresponding  linear  combinations  in  Rm  in  the  sense  that 

7^(*iui  +*2^2+  • ’ • +^Ur)=^l^(ui)+^2^(u2)+  • • • (10) 


Depending  on  whether  ^-tuples  and  m-tuples  are  regarded  as  vectors  or  points,  the  geometric  effect  of  a matrix 
transformation  Tj^.Rn  — ► Rm  is  to  map  each  vector  (point)  in  Rn  into  a vector  (point)  in  Rm  (Figure  4.9.2). 


R*  Rm  R*  Rm 


T maps  vectors  to  vectors. 


T maps  points  to  points. 


Figure  4.9.2 


The  following  theorem  states  that  if  two  matrix  transformations  from  Rn  to  Rm  have  the  same  image  at  each  point  of 
Rn , then  the  matrices  themselves  must  be  the  same. 


THEOREM  4.9.2 

If  T j±.  R n — ► Rm  and  Tg:  Rn  — ► Rm  are  matrix  transformations,  and  if  T^(x)  = Tgfx)  for  every  vector  x in  Rn 
, then  A = B- 


To  say  that  7^(x)  = Tg(x)  for  every  vector  in  R”  is  the  same  as  saying  that 


Ax  = Bx 

for  every  vector  x in  R n.  This  is  true,  in  particular,  if  x is  any  of  the  standard  basis  vectors  e\,  e2, e„  for  Rn;  that  is, 

Aej  = Bej  (j  = 1,  2 n)  (11) 

Since  every  entry  of  e/  is  0 except  for  the  yth,  which  is  1,  it  follows  from  Theorem  1.3.1  that  Ae ? is  the  yth  column  of  A 
and  Be  y is  the  yth  column  of  B.  Thus,  it  follows  from  1 1 that  corresponding  columns  of  A and  B are  the  same,  and  hence 
that  ,4  = B • 


EXAMPLE  2 Zero  Transformations 


If  0 is  the  yn  x n zero  matrix,  then 

7q(x)  = 0x  = 0 

so  multiplication  by  zero  maps  every  vector  in  Rn  into  the  zero  vector  in  Rm.  We  call  T q the  zero 
transformation  from  Rn  to  R m. 


EXAMPLE  3 Identity  Operators 

If  / is  the  n x n identity  matrix,  then 

7/(x)  =lx  = x 

so  multiplication  by  / maps  every  vector  in  Rn  into  itself  We  call  7j  the  identity  operator  on  Rn. 


A Procedure  for  Finding  Standard  Matrices 

There  is  a way  of  finding  the  standard  matrix  for  a matrix  transformation  from  Rn  to  Rm  by  considering  the  effect  of 
that  transformation  on  the  standard  basis  vectors  for  Rn.  To  explain  the  idea,  suppose  that  A is  unknown  and  that 

ei,  e2, e„ 

are  the  standard  basis  vectors  for  Rn.  Suppose  also  that  the  images  of  these  vectors  under  the  transformation  T A are 

TA(ei)=Aei,  TA(e2)  = Ae2, TA(e„)=Ae„ 

It  follows  from  Theorem  1.3.1  that  Ae,  is  a linear  combination  of  the  columns  of  A in  which  the  successive  coefficients 
are  the  entries  of  e/.  But  all  entries  of  e;  are  zero  except  the yth,  so  the  product  Aej  is  just  the yth  column  of  the  matrix 
A.  Thus, 


-4=  [TJ4(ei)|7’J4(e2)|  • • • \T A&n)] 


(12) 


In  summary,  we  have  the  following  procedure  for  finding  the  standard  matrix  for  a matrix  transformation: 


Finding  the  Standard  Matrix  for  a Matrix  Transformation 

Step  1.  Find  the  images  of  the  standard  basis  vectors  ej,  e2, e„  for  Rn  in  column  form. 

Step  2.  Construct  the  matrix  that  has  the  images  obtained  in  Step  1 as  its  successive  columns.  This  matrix  is  the 
standard  matrix  for  the  transformation. 


J 


Reflection  Operators 


Some  of  the  most  basic  matrix  operators  on  and  R are  those  that  map  each  point  into  its  symmetric  image  about  a 

fixed  line  or  a fixed  plane;  these  are  called  reflection  operators.  Table  1 shows  the  standard  matrices  for  the  reflections 
about  the  coordinate  axes  in  r},  and  Table  2 shows  the  standard  matrices  for  the  reflections  about  the  coordinate  planes 
in  r}.  In  each  case  the  standard  matrix  was  obtained  by  finding  the  images  of  the  standard  basis  vectors,  converting 
those  images  to  column  vectors,  and  then  using  those  column  vectors  as  successive  columns  of  the  standard  matrix. 


Table  1 


Operator 

Illustration 

Images  of  ei  and  e2 

Standard 

Matrix 

Reflection  about  the 
y-acis 

T(x,y)  = (-x.y) 

(-*.>•)  - — 
m 

V 

(*.>’) 

x * 

T(ei)  = 7(1,0)  = (-1.0) 
7(e2)  = 7(0,1)  = (0,1) 

"-1  o' 
0 1 

Reflection  about  the 
x-axis 

T(x,y)  = (x,  -y) 

f(\)  - 

i>  , <*.?) 

x j 

X 

1 ► 

1 

^ 1 

* (t,  -y) 

7(ei)  = 7(1,0)  = (1,0) 
7(e2)  = 7(0,  1)  = (0,  — 1) 

0 —1 

1 

— 1 o 

Reflection  about  the  line 

y = x 

T(x,  y)  = (y,  x) 

i 

7\x) 

ly  y = x 

(>\  x) 

\ 

Y 

X V (x.y)  x 

► 

7(ei)  = 7(1,0)  = (0,1) 
7(e2)  = 7(0,1)  = (1,0) 

0 f 

_!  0. 

Table  2 


Projection  Operators 


Matrix  operators  on  g^  and  g^  that  map  each  point  into  its  orthogonal  projection  on  a fixed  line  or  plane  are  called 
projection  operators  (or  more  precisely,  orthogonal  projection  operators).  Table  3 shows  the  standard  matrices  for  the 
orthogonal  projections  on  the  coordinate  axes  in  g},  and  Table  4 shows  the  standard  matrices  for  the  orthogonal 
projections  on  the  coordinate  planes  in  g^. 


Table  3 


Operator 

Illustration 

Images  of  ei  and  e2 

Standard 

Matrix 

Orthogonal  projection  on  the 
*-axis  T(x,  y ) = (x,  0) 

i 

, (*.>) 

X 1 

1 

♦ <xO)  x 

n*) 

T(ei)  = 7(1,0)  = (1.0) 
7(e2)  = 7(0.1)  = (0.0) 

i i 

O —L 
.0  0 

Orthogonal  projection  on  the 
7-axis  7(x,  y)  = (0,  y) 

f 

(0-y)  « (*>) 

T(\)  x 

X 

1 

7(ei)  = 7(1,0)  = (0,0) 
7(e2)  = 7(0,1)  = (0,1) 

1 1 

o o 

— 1 o 

Table  4 


Operator 

Illustration 

Images  of  ei,  e2,  e3 

Standard 

Matrix 

Orthogonal  projection  on 
the  xy-plane 
T(x,y,z)  = (x,  y,  0) 

■/ 

iz 

x 1 (X,  >;  z) 

l y 

; — t — ** 

T(x) 

(x  v.  0) 

7(ei)  = 7(1, 0,0)  = (1,0,0) 
7(e2)  = 7(0,  1,0)  = (0,1,0) 
7(e3)  = 7(0,0,  1)  = (0,0,0) 

"1  0 o' 
0 1 0 
0 0 0 

Orthogonal  projection  on 
the  xz-plane 
T(x,y,z)  = (x,  0,z) 

i 

[x,  0.  Z)  — 

m 

./ 

i z 

U :) 

>’ 

►- 

7(ei)  = 7(1, 0,0)  = (1,0,0) 
7(e2)  = 7(0,  1,0)  = (0,0,0) 
7(e3)  = 7(0, 0.1)  = (0.0.1) 

"1  0 o' 
0 0 0 
0 0 1 

Orthogonal  projection  on 
the  yz-plane 
T(x,y,z)  = (0,  y,z) 

i 

J 

.(0.  y,  z) 

71x)  / 

(X  >1  z) 

x y 

7(ei)  = 7(1,0, 0)  = (0,0,0) 
7(e2)  = 7(0,  1,  0)  = (0,  1,  0) 
7(e3)  = 7(0,0,  1)  = (0, 0,1) 

0 0 o' 
0 1 0 
0 0 1 

Rotation  Operators 


Matrix  operators  on  r}  and  R?  that  move  points  along  circular  arcs  are  called  rotation  operators.  Let  us  consider  how 
to  find  the  standard  matrix  for  the  rotation  operator  T:  B?  — » R1  that  moves  points  counterclockwise  about  the  origin 
through  an  angle  0 (Figure  4.9.3).  As  illustrated  in  Figure  4.9.3,  the  images  of  the  standard  basis  vectors  are 
T(e\)  = T(  1,  0)  = (cos  #,  sin#)  and  T(e 2)  = T( 0,  1)  = ( — sin#,  cos  #) 


so  the  standard  matrix  for  T is 


cos#  —sin  9 
sin#  cos# 


In  keeping  with  common  usage  we  will  denote  this  operator  by  Rq  and  call 


Re  = 


cos  9 
sin# 


—sin# 

cos# 


(13) 


the  rotation  matrix  for  R^.  If  x = (*,  y)  is  a vector  in  R^,  and  if  w=  (w\,  W2)  is  its  image  under  the  rotation,  then  the 
relationship  w = R@c  can  be  written  in  component  form  as 

w\  = xcos#  — ysin# 

(14 

W2  = xsm&^ycos9 

These  are  called  the  rotation  equations  for  R?.  These  ideas  are  summarized  in  Table  5. 


Table  5 


Operator 

Illustration 

Rotation  Equations 

Standard  Matrix 

Rotation  through  an  angle  # 

i 

(w^w2) 

« \ 

\ 

X \(x.y) 

e\  x 

w\  =xcos#— ^sin# 
= xsin#  -F.ycos# 

cos#  —sin# 
sin#  cos# 

In  the  plane,  counterclockwise  angles  are  positive 
and  clockwise  angles  are  negative.  The  rotation 
matrix  for  a clockwise  rotation  of  — .0  radians  can  be 
obtained  by  replacing  ff  by  — # in  12.  After 
simplification  this  yields 

cos#  sin# 

—sin#  cos# 


R-e  = 


EXAMPLE  4 A Rotation  Operator 


Find  the  image  of  x = ( 1 , 1 ) under  a rotation  of  % / 6 radians  I — 30  i about  the  origin. 


It  follows  from  13  with  Q = ^ / 6 that 


R tt/6 ^ — 


\n^] 

T 

2 

'0.37' 

_l_ 

l + l/J 

£3 

1.37_ 

2 

or  in  comma-delimited  notation,  ( 1 , 1)  = (0.37,  1.37). 


Rotations  in  R3 

A rotation  of  vectors  in  R 3 is  usually  described  in  relation  to  a ray  emanating  from  the  origin,  called  the  axis  of 
rotation.  As  a vector  revolves  around  the  axis  of  rotation,  it  sweeps  out  some  portion  of  a cone  (Figure  4.9.4a).  The 
angle  of  rotation , which  is  measured  in  the  base  of  the  cone,  is  described  as  “clockwise”  or  “counterclockwise”  in 
relation  to  a viewpoint  that  is  along  the  axis  of  rotation  looking  toward  the  origin.  For  example,  in  Figure  4.9.4a  the 
vector  w results  from  rotating  the  vector  x counterclockwise  around  the  axis  / through  an  angle  0.  As  in  R^,  angles  are 
positive  if  they  are  generated  by  counterclockwise  rotations  and  negative  if  they  are  generated  by  clockwise  rotations. 


(a)  Angle  of  rotation  ( b ) Right-hand  rule 


Figure  4.9.4 

The  most  common  way  of  describing  a general  axis  of  rotation  is  to  specify  a nonzero  vector  u that  runs  along  the  axis 
of  rotation  and  has  its  initial  point  at  the  origin.  The  counterclockwise  direction  for  a rotation  about  the  axis  can  then  be 
determined  by  a “right-hand  rule”  (Figure  4.9.46):  If  the  thumb  of  the  right  hand  points  in  the  direction  of  u,  then  the 
cupped  fingers  point  in  a counterclockwise  direction. 

A rotation  operator  on  R is  a matrix  operator  that  rotates  each  vector  in  R^  about  some  rotation  axis  through  a fixed 
angle  Q.  In  Table  6 we  have  described  the  rotation  operators  on  r}  whose  axes  of  rotation  are  the  positive  coordinate 
axes.  For  each  of  these  rotations  one  of  the  components  is  unchanged,  and  the  relationships  between  the  other 
components  can  be  derived  by  the  same  procedure  used  to  derive  14.  For  example,  in  the  rotation  about  the  z-axis,  the 
z-components  of  x and  w=  T(x)  are  the  same,  and  the  x-  and  y-components  are  related  as  in  14.  This  yields  the  rotation 
equation  shown  in  the  last  row  of  Table  6. 


Table  6 


Operator 

Illustration 

Rotation  Equations 

Standard  Matrix 

Counterclockwise 
rotation  about 
the  positive  *-axis 
through  an 
angle  0 

jz 

y 

► 

wx  = X 

W2  = y cos  0 - z sin  6 
s y sin  6 + z cos  0 

[1  0 0 
0 cos  0 -sin  0 

[ 0 sin  0 cos0  ^ 

Counterclockwise 
rotation  about 
the  positive  v-axis 
through  an 
angle  O 

J 

•/ 

iZ 

► 

M * 

u?\  = .vcos  0+2  sin  0 
u?2  = V 

w$  = -jc  sin  0+  2 cos  0 

cos0  0 sin0 
0 1 0 
-sin  0 0 cos  0 _ 

Counterclockwise 
rotation  about 
the  positive  2-axis 
through  an 
angle  0 

i 

l 

) 

iZ 

1 w 

V 

l ► 

w 1 = x cos  0-  v sin  0 
w 2 = x sin  0 + y cos  0 
= 2 

cos  0 -sin  0 0 

sin  0 cos  0 0 

° 0 1. 

For  completeness,  we  note  that  the  standard  matrix  for  a counterclockwise  rotation  through  an  angle  0 about  an  axis  in 
which  is  determined  by  an  arbitrary  unit  vector  u = {a,  b,  c ) that  has  its  initial  point  at  the  origin,  is 


+ cos0  ab  ( 1 — cos0)  — csiru0  ac  ( 1 — cos0)  + bswB 


a2(\  — cosflj 

ab(l  — cos0)  + csi n0  b^(\  — cos^j  4=  cos0  bc(l  — cos0)  — asmd 

ac(l  — cos0)  — bsw0  bc(l  — cos0)  + asm#  c2^l  — cosflj  + cos0 


(15) 


The  derivation  can  be  found  in  the  book  Principles  of  Interactive  Computer  Graphics , by  W.  M.  Newman  and  R.  F. 
Sproull  (New  York:  McGraw-Hill,  1979).  You  may  find  it  instructive  to  derive  the  results  in  Table  6 as  special  cases  of 
this  more  general  result. 


Dilations  and  Contractions 

If  k is  a nonnegative  scalar,  then  the  operator  T’(x)  = kx.  on  p or  $ has  the  effect  of  increasing  or  decreasing  the 
length  of  each  vector  by  a factor  of  k.  If  0 < k < 1 the  operator  is  called  a contraction  with  factor  k,  and  if  k > 1 it  is 


called  a dilation  with  factor  k (Figure  4.9.5).  If  £ = 1,  then  T is  the  identity  operator  and  can  be  regarded  either  as  a 
contraction  or  a dilation.  Tables  7 and  8 illustrate  these  operators. 


Figure  4.9.5 


Table  7 


Operator 

Illustration 

T(x,y)  = (kx,ky) 

Effect  on  the  Standard  Basis 

Standard 

Matrix 

Contraction  with  factor  k 
on/?2  (0  < k<  1) 

l 

x (*.>’) 

X 

(0.  1)1 1 

(0,  A) 

H 

\k  ol 

L°  *J 

(1.  0)  ( k , 0) 

Dilation  with  factor  k on 
R 2 (*>1) 

i 

,y  T(\yp(kx.ky) 

x ' u.y) 

X 

(0.  1)1 1 

'■  (1. 

u 

0)  ' ( k , 

0) 

Table  8 


Operator 

Illustration 

T(x,y,  z)  = (Ax,  Ay,  kz) 

Standard 

Matrix 

Contraction  with 
factor  k on  R* 

(0  <kk<k  1) 

i 

iZ 

x • (x.  y.  z) 
T(x)y**^k\\  ky\  kz) 

v 

*/ 

'k  0 Ol 
0 k 0 

.0  0 *J 

Dilation  with 
factor  k on  R* 

(k>  1) 

i 

^ z (kx.  ky\  kz) 

TXxyf 

x/c *,y,  z) 

yT  y 

X K 

Yaw,  Pitch,  and  Roll 

In  aeronautics  and  astronautics,  the  orientation  of  an  aircraft  or  space  shuttle  relative  to  an  xyz-coordinate 
system  is  often  described  in  terms  of  angles  called  yaw,  pitch,  and  roll.  If,  for  example,  an  aircraft  is  flying 


along  the  y-axis  and  the  xy -plane  defines  the  horizontal,  then  the  aircraft’s  angle  of  rotation  about  the  z-axis  is 
called  the  yaw,  its  angle  of  rotation  about  the  x-axis  is  called  the  pitch , and  its  angle  of  rotation  about  the  y- axis 
is  called  the  roll.  A combination  of  yaw,  pitch,  and  roll  can  be  achieved  by  a single  rotation  about  some  axis 
through  the  origin.  This  is,  in  fact,  how  a space  shuttle  makes  attitude  adjustments — it  doesn't  perform  each 
rotation  separately;  it  calculates  one  axis,  and  rotates  about  that  axis  to  get  the  correct  orientation.  Such  rotation 
maneuvers  are  used  to  align  an  antenna,  point  the  nose  toward  a celestial  object,  or  position  a payload  bay  for 
docking. 


▲ z 


Expansion  and  Compressions 


In  a dilation  or  contraction  of  p}  or  r},  all  coordinates  are  multiplied  by  a factor  k.  If  only  one  of  the  coordinates  is 
multiplied  by  k,  then  the  resulting  operator  is  called  an  expansion  or  compression  with  factor  k.  This  is  illustrated  in 
Table  9 for  R*.  You  should  have  no  trouble  extending  these  results  to  g}. 


Table  9 


Operator 


Illustration 

T(x,y)  = (kx,y) 


Effect  on  the  Standard  Basis 


Standard 

Matrix 


Compression  of  p}  in  the 

x-direction  with  factor  k 
(0  <*<1) 


t (kx.y) 

tU  

x 


(0.  1) 


(0.1) 


h- 


(1,0) 


(*,  0) 


Expansion  of  R 2 in  the 
x-direction  with  factor  k 
(*>1) 


Operator 


k> 


C*.  y)  ( kx%y ) 


10.  1) 


7\x) 


x 


(0,  1) 


k 0 

0 1 


Illustration 

T{x,y)  = (x,ky) 


(1.0)  ' (k,  0) 

Effect  on  the  Standard  Basis 


Standard 

Matrix 


Compression  of  R2  in  the 

y-direction  with  factor  k 
(0  <k<\) 


, (*y) 

* U ky) 


\ 


m 


(0.  I) 


(0. 1:) 


HLt* 


1 


1 

'1  O' 

L 

.0  k_ 

(1.0) 


(1,0) 


Operator 


Illustration 

T(x,y)  = (kx,y) 


Effect  on  the  Standard  Basis 


Standard 

Matrix 


Expansion  of  R 2 in  the 
^-direction  with  factor  k 

(*>  i) 


(O,  I) 


(0.*)  ■- 


tt 


(1,0) 


(1.0) 


Shears 


A matrix  operator  of  the  form  T(x,  y)  = (x  \ ky,y  ) translates  a point  (x9  y)  in  the  xy -plane  parallel  to  the  x-axis  by 
an  amount  ky  that  is  proportional  to  the  y-coordinate  of  the  point.  This  operator  leaves  the  points  on  the  x-axis  fixed 
(since  y = 0),  but  as  we  progress  away  from  the  x-axis,  the  translation  distance  increases.  We  call  this  operator  the 
shear  in  the  x-direction  with  factor  k.  Similarly,  a matrix  operator  of  the  form  T(x,  y)  = (x,y  } kx)  is  called  the 
shear  in  the  y-direction  with  factor  k.  Table  10  illustrates  the  basic  information  about  shears  in  gf. 


Table  10 


Operator 


Effect  on  the  Standard  Basis 


Standard 

Matrix 


Shear  of  R1  in  the  x-direction  with 
factor  k T(x,  y)  = (x  4-  ky,  y) 


(0.  I)' 


(k  1) 


<*.  1) 


l 


i * 
0 1 


(1.0) 


(1,0) 

(k  >0) 


(1,0) 

<*<0) 


Shear  of  R*  in  the  y-direction  with 
factor  k T(x,  y ) = {x,  y + kx) 


(0.  I)' 


(0.  1)  1 


<0.  1)1 


(1,*) 


1 0 

k 1 


(1.0) 


(1.*) 


(k  > 0) 


(*  < 0) 


o 

EXAMPLE  5 Some  Basic  Matrix  Operators  on  R 


In  each  part  describe  the  matrix  operator  corresponding  to  A,  and  show  its  effect  on  the  unit  square. 


(aMl  = 


1 2 
0 1 


(b)^2  = 


2 0 
0 2 


(cM3  = 


2 0 
0 1 


By  comparing  the  forms  of  these  matrices  to  those  in  Tables  7,  9,  and  10,  we  see  that  the 
matrix  A\  corresponds  to  a shear  in  the  x-direction  with  factor  2,  the  matrix  Aj  corresponds  to  a dilation 
with  factor  2,  and  A%  corresponds  to  an  expansion  in  the  x-direction  with  factor  2.  The  effects  of  these 
operators  on  the  unit  square  are  shown  in  Figure  4.9.6. 


i 

i* 

i 

3 

2 

3 

k V 

-> 

1 

L 

1 

1 

1 

i i 

i 

i 

1 

X 

12  3^  1 1 

i 2 : 

~\ 

1 2 3 

Figure  4.9.6 


OPTIONAL 

Orthogonal  Projections  on  Lines  Through  the  Origin 


In  Table  3 we  listed  the  standard  matrices  for  the  orthogonal  projections  on  the  coordinate  axes  in  These  are  special 
cases  of  the  more  general  operator  T:R^  — + R2  that  maps  each  point  into  its  orthogonal  projection  on  a line  L through 
the  origin  that  makes  an  angle  0 with  the  positive  x-axis  (Figure  4.9.7).  In  Example  4 of  Section  3.3  we  used  Formula  10 
of  that  section  to  find  the  orthogonal  projections  of  the  standard  basis  vectors  for  on  that  line.  Expressed  in  matrix 
form,  we  found  those  projections  to  be 

siru0cos0 
sin20 


ei  = 


cos20 

siruScos# 


and  T 


e2  = 


Thus,  the  standard  matrix  for  T is 


\ 

cos  20 

•hm20 

cos^O 

sintfcos  9 

2 

T 

= 

T 

ei 

T 

= 

= 

-^-sin2  0 

sin9cos0 

sin20 

sin20 

_ 

i i 

l 1 

- 

In  keeping  with  common  usage,  we  will  denote  this  operator  by 


cos20 

^-sin20 

COS^0 

sini9cos0 

2 

sin0cos0 

sin20 

^-sin20 

sin20 

2 

(16) 


We  have  included  two  versions  of  Formula  16 
because  both  are  commonly  used.  Whereas  the  first 
version  involves  only  the  angle  0,  the  second 
involves  both  9 and  20. 


EXAMPLE  6 Orthogonal  Projection  on  a Line  Through  the  Origin 


Use  Formula  16  to  find  the  orthogonal  projection  of  the  vector  x = (1,  5)  on  the  line  through  the  origin 
that  makes  an  angle  of^/6  ^=30  J with  the  x-axis. 

Since  sin(7T  / 6)  = 1 / 2 and  cos  / 6 j = ^3  / 2,  it  follows  from  16  that  the  standard  matrix 
for  this  projection  is 


P n/6  — 


Thus, 


P ?r/6x  — 


cos2^r/  6j 

sin(7r  / 6)cos (tt  / 6) 

3 

4 

ill 

4 

sin(W  6)cos(jt/  6) 

sin2  (V  / 6 ) 

a 

1 

- 

4 

4 

i £] 

3 4-  5^3 

4 4 

'f 

4 

'2.91' 

a i 

_5 

l/3  + 5 

_1.68_ 

4 4 

4 

or  in  comma-delimited  notation,  PT/6 ( 1 , 5)  « (2.91,  1.68) 


Reflections  About  Lines  Through  the  Origin 

In  Table  1 we  listed  the  reflections  about  the  coordinate  axes  in  These  are  special  cases  of  the  more  general  operator 
Hq.R2  —+  R1  that  maps  each  point  into  its  reflection  about  a line  L through  the  origin  that  makes  an  angle  0 with  the 
positive  x-axis  (Figure  4.9.8).  We  could  find  the  standard  matrix  for  Hq  by  finding  the  images  of  the  standard  basis 
vectors,  but  instead  we  will  take  advantage  of  our  work  on  orthogonal  projections  by  using  the  Formula  16  for  Pq  to 
find  a formula  for  Hq. 


Figure  4.9.8 


You  should  be  able  to  see  from  Figure  4.9.9  that  for  every  vector  x in  Rn 

P$x  — x = ^ \H$x  — x j or  equivalently  Hqx.  = \2Pq  — I Jx 


Thus,  it  follows  from  Theorem  4.9.2  that 


He  = 2Pg-I 


(17) 


and  hence  from  16  that 


Hg  = 


cos20  sin20 
sin20  — cos20 


(18) 


EXAMPLE  7 Reflection  About  a Line  Through  the  Origin 


Find  the  reflection  of  the  vector  x = (1,  5)  on  the  line  through  the  origin  that  makes  an  angle  of  n/6(=  30°) 
with  the  x-axis. 

Since  sini^r  / 3 J = }j 3 f 2 and  cos(tt  / 3)  = 1 / 2,  it  follows  from  18  that  the  standard  matrix 
for  this  projection  is 

-i  & 

cos  (tc  / 3)  sin(7r/3)  2 2 

sin(W  3)  “Cos(tt/3) 


^t/6  = 


n 

2 


Thus, 


#ir/6*  = 


i £\ 

1 + 5/3 

2 2 

T 

2 

' 4.83  ' 

a _i 
2 2 

_5_ 

JI=L 

2 

£3 

—1.63 

or  in  comma-delimited  notation,  H^f^{  1,  5)  ss  (4.83,  — 1.63) 


Show  that  the  standard  matrices  in  Tables  1 and  3 
are  special  cases  of  18  and  16. 


Concept  Review 

Function 

Image 


Value 

Domain 

Codomain 

Transformation 

Relationships  among  the  fundamental  spaces 
Operator 

Matrix  transformation 
Matrix  operator 
Standard  matrix 

Properties  of  matrix  transformations 

Zero  transformation 

Identity  operator 

Reflection  operator 

Projection  operator 

Rotation  operator 

Rotation  matrix 

Rotation  equations 

Axis  of  rotation  in  3 -space 

Angle  of  rotation  in  3-space 

Expansion  operator 

Compression  operator 

Shear 

Dilation 

Contraction 

Skills 

Find  the  domain  and  codomain  of  a transformation,  and  determine  whether  the  transformation  is  linear. 
Find  the  standard  matrix  for  a matrix  transformation. 

Describe  the  effect  of  a matrix  operator  on  the  standard  basis  in  Rn. 


Exercise  Set  4.9 

In  Exercises  1-2,  find  the  domain  and  codomain  of  the  transformation  7^(x)  = Ax . 

(a)  A has  size  3x2- 

(b)  A has  size  2x3- 

(c)  A has  size  3x3- 

(d)  A has  size  1x6- 


Answer: 


(a)  Domain:  r}\  codomain:  R-' 

(b)  Domain:  R^;  codomain:  R^ 

(c)  Domain:  R 3;  codomain:  R-' 

(d)  Domain:  R codomain:  R 1 

(a)  A has  size  4x5- 

(b)  A has  size  5x4- 

(c)  A has  size  4x4- 

(d)  A has  size  3x1- 

3.  If  i , X2)  = (x\  4=  X2,  — 3xi),  then  the  domain  of  T is , the  codomain  of  T is , and 

the  image  of  x = (1,  — 2)  under  T is . 

Answer: 

R2,  R3,  (-1,2,3) 

4.  If  T(x\,  X2,  *3)  = (*i  + 2^2,  *1  “ 2x2),  ^en  domain  of  T is , the  codomain  of  T is , 

and  the  image  of  x = (0,  —1,4)  under  T is . 

5.  In  each  part,  find  the  domain  and  codomain  of  the  transformation  defined  by  the  equations,  and  determine  whether 
the  transformation  is  linear. 

(a)  w 1 = 3xi  -2x2  + 4;t3 
W2  = 5xi-8x2+  X3 

(b)  W!  =2xix2  - x2 
m>2=  xi  +3xjX2 
W3  = xi  + X2 

(c)  wi  = 5xi  - *2  + *3 
W2  = — xi  + X2+7X3 
W3  = 2xi  —4x2  _ *3 

(d)  w j = x^  — 3x2  + X3  — 2x4 

2 

W2  = 3xi —4x2  — X3  + *4 
Answer: 

(a)  Linear;  R 3 R 2 

(b)  Nonlinear;  r}  _>  r} 

(c)  Linear;  R?  R? 

(d)  Nonlinear;  R4  r} 

6.  In  each  part,  determine  whether  T is  a matrix  transformation. 

(a)  T{x,y)  = (2x,y) 

(b)  T(x,y)  = (-y,x) 

(c)  T(x,y)  = (2x+y,x -y) 

(d)  7(x,7)=(x2,7) 


(e)  T(x,y)  = (x, y + \) 


7.  In  each  part,  determine  whether  T is  a matrix  transformation. 

(a)  T(x,y,z)  = (0,  0) 

(b)  T(x,y,z)  = (1,  1) 

(c)  T(x,  y,  z)  = (3x  - Ay,  2x  - 5 z) 

(d)  T(x,y,z^j  = ( y 2,zj 

(e)  7(x-,7,z)  = O-  l,x) 

Answer: 

(a)  and  (c)  are  matrix  transformations;  (b),  (d),  and  (e)  are  not  matrix  transformations. 

8.  Find  the  standard  matrix  for  the  transformation  defined  by  the  equations. 


(a)  wi 

= 

2xi  - 3x2  + 

vt?2 

= 

3xi  + 5x2  ~ 

(b)  wi 

= 

7xj  + 2x2  — 8x3 

W2 

= 

-X2  + 5x3 

W2 

= 

4xi  +7X2-X3 

(c)  ”1 

= 

-xi  + x2 

W2 

= 

3xi  - 2x2 

W2 

= 

5xi  - 7x2 

(d) 

= 

*1 

vt?2 

xi  +x2 

W3 

= 

xi  +X2  +X3 

W4 

= 

xi  +X2  + X3  +X4 

9.  Find  the  standard  matrix  for  the  operator  T.F?  R~'  defined  by 

w\  = 3xi  + 5x2  “ X3 

M?2  = 4xi  - *2  + *3 

m?3  = 3xi  + 2x2  “ *3 

and  then  calculate  T(— 1,2, 4)  by  directly  substituting  in  the  equations  and  also  by  matrix  multiplication. 
Answer: 

'3  5 -1" 

4 -1  1 ;T(-1,2,4)  = (3,  -2,  -3) 

3 2-1 

10.  Find  the  standard  matrix  for  the  operator  T defined  by  the  formula. 

(a)  T{x uX2)  = (,2xi-X2,xi+x2) 

(b)  r(*i,X2)  = (*1.*2) 

(c)  T(.xi,X2,X3)  = (xi  4-  2x2 +x3,  *i+5x2,x3) 

(d)  T(xi,x2,x3)  = (Ax i,lx2,  -8x3) 

11.  Find  the  standard  matrix  for  the  transformation  T defined  by  the  formula. 

(a)  7(xi,X2)  = (x2,  -xi,  xi + 3x2,  xi  ~x2) 

(b)  T(xi,X2,  x3,  x4)  = (7xi  +2x2-x3  + x4,  x2  + x3,  -xi) 


(c)  T(x  1,  X2,  x3)  = (0,  0,  0,  0,  0) 

(d)  T{x i,x2,x3,x4)  = (x4,x\,x3,x2,x\  -x3) 


Answer: 


(a) 


0 1 

-1  0 

1 3 

1 -1 


(b) 


7 2-11 
0 1 10 
-10  0 0 


(c) 


0 0 0 
0 0 0 
0 0 0 
0 0 0 
0 0 0 


(d) 


0 

0 

0 

1 

1 

0 

0 

0 

0 

0 

1 

0 

0 

1 

0 

0 

1 

0 

-1 

0 

12.  In  each  part,  find  T{x),  and  express  the  answer  in  matrix  form. 


(a) 


x = 


3 

-2 


(b) 


(c) 


T 


-12  0 
3 1 5 


1 

1 

CO 

4^ 

"*r 

T 

= 

3 5 7 

; x = 

*2 

l 

o 

*3 

(d) 


T 


-1  1 

2 4 ; 
7 8 


x = 


*1 

x2 


13.  In  each  part,  use  the  standard  matrix  for  T to  find  7(x);  then  check  the  result  by  calculating  T(x)  directly. 

(a)  T(x \,x2)  = (-xi+X2.X2);x=(-  1,4) 

(b)  T{x \,X2,x3)  = (2xi  -X2+X3,  X2+x3,  0);x  = (2>  “ 3) 


Answer: 

(a)  T(- 1,4)  = (5,4) 

(b)  TOl.  — 3)  = (0,  -2,0) 

14.  Use  matrix  multiplication  to  find  the  reflection  of  ( — 1,  2)  about 
(a)  the  x-axis. 


(b)  thej-axis. 

(c)  the  line  y = x. 


15.  Use  matrix  multiplication  to  find  the  reflection  of  (2,  — 5,  3)  about 

(a)  the  xr-plane. 

(b)  thexz-plane. 

(c)  thejz-plane. 

Answer: 

(a)  (2,  -5,-3) 

(b)  (2,  5,  3) 

(c)  ("2,  -5,3) 

16.  Use  matrix  multiplication  to  find  the  orthogonal  projection  of  (2,  — 5)  on 

(a)  the  x-axis. 

(b)  thej-axis. 

17.  Use  matrix  multiplication  to  find  the  orthogonal  projection  of  ( — 2,  1,  3)  on 

(a)  thexj-plane. 

(b)  thexz-plane. 

(c)  thejz-plane. 

Answer: 

(a)  (-2,1,0) 

(b)  (-2,0,3) 

(c)  (0,  1,  3) 

18.  Use  matrix  multiplication  to  find  the  image  of  the  vector  (3,  — 4)  when  it  is  rotated  through  an  angle  of 

(a)  0 = 30°- 

(b)  0=  _60°- 

(c)  0 = 45°- 

(d)  0 = 90°- 

19.  Use  matrix  multiplication  to  find  the  image  of  the  vector  ( — 2,  1,  2)  if  it  is  rotated 

(a)  30°  about  the  x-axis. 

(b)  45°  about  the  j-axis. 

(c)  90°  about  the  z-axis. 


Answer: 


(b)  (o,  1,2/2) 

(c)  (-1.  -2.2) 


20.  Find  the  standard  matrix  for  the  operator  that  rotates  a vector  in  R-'  through  an  angle  of  ^60  about 

(a)  the  x-axis. 

(b)  they-axis. 

(c)  thez-axis. 

21.  Use  matrix  multiplication  to  find  the  image  of  the  vector  ( — 2,  1,  2)  if  it  is  rotated 

(a)  —30  about  the  x-axis. 

(b)  —45  about  the  y-axis. 

(c)  —90  about  the  z-axis. 

Answer: 

(a)  / ^ + 2 —1  + 2 \j~3 

l ’ 2 ’ 2 

(b)  (-2/2, 1,0) 

(C)  (1,2,2) 

22.  In  R-'  the  orthogonal  projections  on  the  x-axis,  y- axis,  and  z-axis  are  defined  by 

Ti(x.y.z)  = (*,  0,  0),  T2ix.y.z)  = (0,7,  0), 

T3(x,y,z)  = (0,  0,z) 

respectively. 

(a)  Show  that  the  orthogonal  projections  on  the  coordinate  axes  are  matrix  operators,  and  find  their  standard 
matrices. 

(b)  Show  that  ifT.R3—>R3  is  an  orthogonal  projection  on  one  of  the  coordinate  axes,  then  for  every  vector  x in  R 
, the  vectors  T(x)  and  x — T(x)  are  orthogonal. 

(c)  Make  a sketch  showing  x and  x — T(x)  in  the  case  where  T is  the  orthogonal  projection  on  the  x-axis. 

23.  Use  Formula  15  to  derive  the  standard  matrices  for  the  rotations  about  the  x-axis,  y-axis,  and  z-axis  in  R-\ 

24.  Use  Formula  15  to  find  the  standard  matrix  for  a rotation  of  ^ / 2 radians  about  the  axis  determined  by  the  vector 
v = (1,  1,  1).  [Note:  Formula  15  requires  that  the  vector  defining  the  axis  of  rotation  have  length  1.] 

25.  Use  Formula  15  to  find  the  standard  matrix  for  a rotation  of  180°  about  the  axis  determined  by  the  vector 
v = (2,  2,  1).  [Note:  Formula  15  requires  that  the  vector  defining  the  axis  of  rotation  have  length  1.] 

Answer: 

8 4" 

9 9 

1 4 

9 9 

4 _7 

9 9 

26.  It  can  be  proved  that  if  A is  a 2 x 2 matrix  with  orthonormal  column  vectors  and  for  which  det(^4)  = 1 , then 
multiplication  by  A is  a rotation  through  some  angle  0.  Verify  that 


!_ 1_ 

J2  {2 

A=  1 .1 

{2  {2 

satisfies  the  stated  conditions  and  find  the  angle  of  rotation. 

27.  The  result  stated  in  Exercise  26  can  be  extended  to  that  is,  it  can  be  proved  that  if  A is  a 3 x 3 matrix  with 

orthonormal  column  vectors  and  for  which  det(^4)  = 1,  then  multiplication  by  A is  a rotation  about  some  axis 

through  some  angle  9.  Use  Formula  15  to  show  that  the  angle  of  rotation  satisfies  the  equation 

„ tr(J)  - 1 
cos  9 = — ^ 


28.  Let  A be  a 3 x 3 matrix  (other  than  the  identity  matrix)  satisfying  the  conditions  stated  in  Exercise  27.  It  can  be 
shown  that  if  x is  any  nonzero  vector  in  then  the  vector  u = Ax.  \ A 1 x I |^1—  determines  an  axis 

rotation  when  u is  positioned  with  its  initial  point  at  the  origin.  [See  “The  Axis  of  Rotation:  Analysis,  Algebra, 
Geometry,”  by  Dan  Kalman,  Mathematics  Magazine , Vol.  62,  No.  4,  October  1989.] 

(a)  Show  that  multiplication  by 


is  a rotation. 

(b)  Find  a vector  of  length  1 that  defines  an  axis  for  the  rotation. 

(c)  Use  the  result  in  Exercise  27  to  find  the  angle  of  rotation  about  the  axis  obtained  in  part  (b). 

29.  In  words,  describe  the  geometric  effect  of  multiplying  a vector  x by  the  matrix  A. 

A=\ 2 °1 


0 —2 


Answer: 


(a)  Twice  the  orthogonal  projection  on  the  x-axis. 

(b)  Twice  the  reflection  about  the  x-axis. 

30.  In  words,  describe  the  geometric  effect  of  multiplying  a vector  x by  the  matrix  A. 

(a)  j4=[2  01 


'a  _i 

2 2 

1 £ 

2 2 


31.  In  words,  describe  the  geometric  effect  of  multiplying  a vector  x by  the  matrix 


Answer: 


cos2*  — sin2* 
2 sin*  cos  * 


—2  sin*  cos  * 
cos2*  — sin2* 


Rotation  through  the  angle  29- 

32.  If  multiplication  by  A rotates  a vector  x in  the  xy-plane  through  an  angle  0,  what  is  the  effect  of  multiplying  x by  A ^ 
? Explain  your  reasoning. 

33.  Let  XQ  be  a nonzero  column  vector  in  and  suppose  that  T.R}  — ► R?  is  the  transformation  defined  by  the  formula 
T(x)  = xo  4-  R(pc,  where  Rq  is  the  standard  matrix  of  the  rotation  of  g}  about  the  origin  through  the  angle  0.  Give  a 
geometric  description  of  this  transformation.  Is  it  a matrix  transformation?  Explain. 

Answer: 

Rotation  through  the  angle  0 and  translation  by  xq;  not  a matrix  transformation  since  xq  is  nonzero. 

34.  A function  of  the  form  / (x)  = mx  4-  b is  commonly  called  a “linear  function”  because  the  graph  of  y = mx  I b is 
a line.  Is/a  matrix  transformation  on  R7 

35.  Let  x = XQ  4-  tv  be  a line  in  Rn,  and  let  T:  Rn  — » Rn  be  a matrix  operator  on  Rn.  What  kind  of  geometric  object  is 
the  image  of  this  line  under  the  operator  77  Explain  your  reasoning. 

Answer: 

A line  in  Rn. 

True-False  Exercises 

In  parts  (a)-(i)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  If  A is  a 2 x 3 matrix,  then  the  domain  of  the  transformation  T j\  is  £2. 

Answer: 

False 

(b)  If  A is  an  m x n matrix,  then  the  codomain  of  the  transformation  T j\  is  Rn. 

Answer: 

False 

(c)  If  T.  Rn  — ► Rm  and  T(0)  = 0,  then  T is  a matrix  transformation. 

Answer: 

False 

(d)  If  T:Rn  — ► Rm  and  T(c\x  4=  C2y)  =c\T(x)  4=  C2T(y)  for  all  scalars  c\  and  C2  and  all  vectors  x and  y in  Rn,  then 
T is  a matrix  transformation. 

Answer: 

True 

(e)  There  is  only  one  matrix  transformation  T.Rn  — ► R™  such  that  T{  — x)  = — T(x)  for  every  vector  x in  Rn. 


Answer: 


False 

(f)  There  is  only  one  matrix  transformation  T\Rn  — ► Rm  such  that  T(x  -I-  y ) = T(x  — y ) for  all  vectors  x and  y in  Rn 
Answer: 

True 

(g)  If  b is  a nonzero  vector  in  Rn,  then  T(x)  = x 4-  b is  a matrix  operator  on  Rn. 

Answer: 

False 

(h) 

The  matrix 


is  the  standard  matrix  for  a rotation. 


Answer: 

False 

® The  standard  matrices  of  the  reflections  about  the  coordinate  axes  in  2-space  have  the  form 
a=  ± 1- 
Answer: 

True 


0 


—a 


where 
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4.10  Properties  of  Matrix  Transformations 

In  this  section  we  will  discuss  properties  of  matrix  transformations.  We  will  show,  for  example,  that  if  several 
matrix  transformations  are  performed  in  succession,  then  the  same  result  can  be  obtained  by  a single  matrix 
transformation  that  is  chosen  appropriately.  We  will  also  explore  the  relationship  between  the  invertibility  of  a 
matrix  and  properties  of  the  corresponding  transformation. 


Compositions  of  Matrix  Transformations 

Suppose  that  T A is  a matrix  transformation  from  Rn  to  R^  and  Tg  is  a matrix  transformation  from  r}  to  Rm.  If  x 
is  a vector  in  Rn,  then  T A maps  this  vector  into  a vector  TA(x)  in  R^,  and  Tg,  in  turn,  maps  that  vector  into  the 
vector  Tg(TA(x))  in  Rm.  This  process  creates  a transformation  from  Rn  to  Rm  that  we  call  the  composition  of 
Tg  with  T A and  denote  by  the  symbol 

TboTA 

which  is  read  “Tg  circle  T A.  As  illustrated  in  Figure  4.10.1,  the  transformation  T A in  the  formula  is  performed 
first;  that  is, 


(TBoTa)(x)  = Tb(Ta(x))  (1) 

This  composition  is  itself  a matrix  transformation  since 

(Tb  O Ta)  (x)  = Tb(Ta(x))  = B(Ta(x))  = B(Ax)  = (BA)x 
which  shows  that  it  is  multiplication  by  BA  This  is  expressed  by  the  formula 

TBoTa=Tba  (2) 


WARNING 

Just  as  it  is  not  true,  in  general,  that 

AB  = BA 

so  it  is  not  true,  in  general,  that 

TboTa=TAoTb 

That  is,  order  matters  when  matrix 
transformations  are  composed. 


R* 


Rk  T4(x) 


Tb{Ta{x)) 


Figure  4.10.1 


Compositions  can  be  defined  for  any  finite  succession  of  matrix  transformations  whose  domains  and  ranges  have 


the  appropriate  dimensions.  For  example,  to  extend  Formula  2 to  three  factors,  consider  the  matrix 
transformations 

TaR”  — Rk,  TB:Rk-+  Rl,  Tc-.Rl^Rm 
We  define  the  composition  IT q o Tq  o T a)R  n->Rmby 

( TC  oTBoTjd  (x)  = Tc(Tb(Ta(x))) 

As  above,  it  can  be  shown  that  this  is  a matrix  transformation  whose  standard  matrix  is  CBA  and  that 

TCoTBoTa=TCba 


(3) 


As  in  Formula  9 of  Section  4.9  , we  can  use  square  brackets  to  denote  a matrix  transformation  without 
referencing  a specific  matrix.  Thus,  for  example,  the  formula 

[T2oTl]  = [T2][Tl]  (4) 

is  a restatement  of  Formula  2 which  states  that  the  standard  matrix  for  a composition  is  the  product  of  the 
standard  matrices  in  the  appropriate  order.  Similarly, 

[T30T2oTi]  = [T3][T2][Ti]  (5) 


is  a restatement  of  Formula  3. 

EXAMPLE  1 Composition  of  Two  Rotations 

Let  T\R?  —*  R?  an<i  TiR?  — ► R?  be  the  matrix  operators  that  rotate  vectors  through  the  angles  9\ 
and  #2?  respectively.  Thus  the  operation 

(72oT1)(x)=r2(71(x)) 

first  rotates  x through  the  angle  9\ , then  rotates  T \ (x)  through  the  angle  02-  It  follows  that  the  net 
effect  ofT2o7i  is  to  rotate  each  vector  in  R1  through  the  angle  9\  + 02  (Figure  4.10.2).  Thus,  the 
standard  matrices  for  these  matrix  operators  are 

cos02  “Sin02 
sin02  cos02 

cos(0i  + 02)  “Sin(0i  + 02) 
sin  (0i  +02)  cos  (0i  + 02) 

These  matrices  should  satisfy  4.  With  the  help  of  some  basic  trigonometric  identities,  we  can 
confirm  that  this  is  so  as  follows: 


72071 


[T  i]  = 


cos0i  — siru^i 
sin0i  cos0i 


t2 


[T2][Ti] 


cos02 

— svo02 

cos0i 

—sm0\ 

sin02 

cos02 

sin0i 

cos^i 

cos02cos0i  — sin02sin0i  — (cos02sin0i  + sin02cos0i) 
sin02cos0i  + cos02sin0i  — sin02sin0i  + cos02cos0i 

cos(0j  + 02)  — sin(0i  4-  02) 
sin  (0i  + 02)  cos  (0i  + 02) 

= [^2  0 7*1] 


EXAMPLE  2 Composition  Is  Not  Commutative 


Let  T\R*  — » R2  be  the  reflection  about  the  line  y = x,  and  let  72  R2  — ► R2  be  the  orthogonal 
projection  on  they-axis.  Figure  4.10.3  illustrates  graphically  that  T\  o T2  and  7*2  o T\  have 
different  effects  on  a vector  x.  This  same  conclusion  can  be  reached  by  showing  that  the  standard 
matrices  for  T\  and  T2  do  not  commute: 


[T\  o T2] 


[72oTi] 


= 

T\ 

t2 

= 

'0  f 
1 0 

'0  O' 
_0  1_ 

= 

1 

0 0 

0 —• 
1 

= 

t2 

T 1 

= 

1 1 r 

O O 

— *■  O 

1 1 L 

'0  r 
1 0 

= 

1 1 1 

0 0 

0 1 — ' 

J 1 1 

so  [72  o T\]  * [T\oT2]- 


Figure  4.10.3 


EXAMPLE  3 Composition  of  Two  Reflections 

Let  be  the  reflection  about  the  r-axis.  and  let  T-,  R~2  _►  p}  be  the  reflection  about  the 

x-axis.  In  this  case  T\  o T2  and  T2  oT[  are  the  same;  both  map  every  vector  x = (x.  y ) into  its 
negative  -x  = ( - x,  - y ) (Figure  4.10.4): 

(Tio T2)(x,y)  =T\(x,  -y)  = (-x,  -y) 

(T2oT0(x,y)  = T2(-x,y)  = (-x,  -y) 

The  equality  of  T\  0T2  and  T2  o T\  can  also  be  deduced  by  showing  that  the  standard  matrices  for 
T\  and  T2  commute: 


0 

to 

= 

T 1 

t2 

= 

'-1  O' 
0 

'1  O' 
.0  -1 

= 

'-1 

0 

o' 

-1_ 

[7207!] 

= 

t2 

T 1 

= 

'1  O' 
.0  -1 

'-1  o' 
0 1 

= 

'-1 

0 

o' 

-1_ 

The  operator  T(x)  = — x on  g}  or  R1'  is  called  the  reflection  about  the  origin.  As  the  foregoing 
computations  show,  the  standard  matrix  for  this  operator  on  g}  is 


Figure  4.10.4 


EXAMPLE  4 Composition  of  Three  Transformations 

Find  the  standard  matrix  for  the  operator  T.E?  —+  Fr'  that  first  rotates  a vector  counterclockwise 
about  the  z-axis  through  an  angle  then  reflects  the  resulting  vector  about  the  yz- plane,  and  then 
projects  that  vector  orthogonally  onto  the  xy- plane. 

The  operator  T can  be  expressed  as  the  composition 


T=T2oT2oT\ 


where  T \ is  the  rotation  about  the  z-axis,  7*2  is  the  reflection  about  the  yz-plane,  and  T3  is  the 
orthogonal  projection  on  the  xy- plane.  From  Tables  6,  2,  and  4 of  Section  4.9  , the  standard 
matrices  for  these  operators  are 


Ti 

— 

cos  9 —sin  9 0 
sin  9 cos  9 0 

t2 

— 

'-1  0 O' 
0 1 0 

7 

T2 

— 

'1  0 o' 
0 1 0 

0 0 1 

0 0 1 

0 0 0 

Thus,  it  follows  from  5 that  the  standard  matrix  for  T is 


[T] 


1 

o 

o 

'-1 

0 

°1" 

= 

0 1 0 

0 

1 

0 

0 0 0 

0 

0 

ij 

—cos  9 

sin  0 

O' 

= 

sin0 

cos  0 

0 

0 

0 

0 

cos  0 
sin  0 
0 


—sin  0 0 
cos  0 0 

0 1 


One-to-One  Matrix  Transformations 

Our  next  objective  is  to  establish  a link  between  the  invertibility  of  a matrix  A and  properties  of  the 
corresponding  matrix  transformation  Ty\. 


DEFINITION  1 

A matrix  transformation  T j±.  R*  — ► Rm  is  said  to  be  one-to-one  if  maps  distinct  vectors  (points)  in  Rn 
into  distinct  vectors  (points)  in  Rm. 


J 


(See  Figure  4.10.5).  This  idea  can  be  expressed  in  various  ways.  For  example,  you  should  be  able  to  see  that  the 
following  are  just  restatements  of  Definition  1: 

T is  one-to-one  if  for  each  vector  b in  the  range  of  A there  is  exactly  one  vector  x in  Rn  such  that  Tjp t = b. 
T j\  is  one-to-one  if  the  equality  T u)  = T^(v)  implies  that  u = v- 


Rn 


/r 


One-to-one 


/r  /r 


Not  one-to-one 


Figure  4.10.5 


Rotation  operators  on  are  one-to-one  since  distinct  vectors  that  are  rotated  through  the  same  angle  have 
distinct  images  (Figure  4.10.6).  In  contrast,  the  orthogonal  projection  of  on  the  xy-plane  is  not  one-to-one 
because  it  maps  distinct  points  on  the  same  vertical  line  into  the  same  point  (Figure  4.10.7). 


Distinct  vectors  u and  v are  rotated  into  distinct  vectors  T(u)  and  T(v) 


The  distinct  points  P and  Q are  mapped  into  the  same  point  M 

The  following  theorem  establishes  a fundamental  relationship  between  the  invertibility  of  a matrix  and  properties 
of  the  corresponding  matrix  transformation. 


THEOREM  4.10.1 

If  A is  an  ^ x n matrix  and  T j±.  Rn  —+  Rn  is  the  corresponding  matrix  operator,  then  the  following 
statements  are  equivalent. 

(a)  A is  invertible. 

(b)  The  range  of  T a is  Rn. 

(c)  T a is  one-to-one. 

We  will  establish  the  chain  of  implications  (a)  > (6)  (c)  =>  (a). 

(«)  ■* (*)  Assume  thatv4  is  invertible.  By  parts  (a)  and  (e)  of  Theorem  4.8.10,  the  system  = b is  consistent 
for  every  ^ x 1 matrix  b in  R n . This  implies  that  T a maps  x into  the  arbitrary  vector  b in  R ”,  which  in  turn 
implies  that  the  range  of  T a is  all  of  R”. 

(b)  =>  (O  Assume  that  the  range  of  T a is  R This  implies  that  for  every  vector  b in  Rn  there  is  some  vector  x 
in  Rn  for  which  T^(x)  = b and  hence  that  the  linear  system  = b is  consistent  for  every  vector  b in  Rn.  But 
the  equivalence  of  parts  ( e ) and  if)  of  Theorem  4.8.10  implies  that  Ax.  = b has  a unique  solution  for  every  vector 


b in  R}}  and  hence  that  for  every  vector  b in  the  range  of  T there  is  exactly  one  vector  x in  Rn  such  that 
Tjp  = h. 

C C)  =»  («)  Assume  that  Tj\  is  one-to-one.  Thus,  if  b is  a vector  in  the  range  of  T j\,  there  is  a unique  vector  x in 
Rn  for  which  T = b.  We  leave  it  for  you  to  complete  the  proof  using  Exercise  30. 


EXAMPLE  5 Properties  of  a Rotation  Operator 

As  indicated  in  Figure  4.10.6,  the  operator  T.R ” — » Rn  that  rotates  vectors  in  through  an  angle 
Q is  one-to-one.  Confirm  that  [T]  is  invertible  in  accordance  with  Theorem  4.10.1. 


From  Table  5 of  Section  4.9  the  standard  matrix  for  T is 


T 


cos  0 —sin  0 
sin  0 cos  0 


This  matrix  is  invertible  because 


det 


T 


cos  0 
sin0 


—sin  0 
cos  0 


= cos2#  4-  sin20  =1*0 


EXAMPLE  6 Properties  of  a Projection  Operator 

As  indicated  in  Figure  4.10.7,  the  operator  T\Rn  — ► Rn  that  projects  each  vector  in  R-' 
orthogonally  on  the  xy-plane  is  not  one-to-one.  Confirm  that  [ T]  is  not  invertible  in  accordance 
with  Theorem  4.10.1. 


From  Table  4 of  Section  4.9  the  standard  matrix  for  T is 


1 

O 

O 

T 

= 

0 1 0 

1 

o 

o 

o 

This  matrix  is  not  invertible  since  det[T]  = 0. 


Inverse  of  a One-to-One  Matrix  Operator 

If  Tj[.Rn  — ¥ Rn  is  a one-to-one  matrix  operator,  then  it  follows  from  Theorem  4.10.1  that  A is  invertible.  The 
matrix  operator 

TA-vRn^Rn 

si 

that  corresponds  to  yj-1  is  called  the  inverse  operator  or  (more  simply)  the  inverse  of  Tj\.  This  terminology  is 
appropriate  because  T j\  and  7^-1  cancel  the  effect  of  each  other  in  the  sense  that  if  x is  any  vector  in  Rn,  then 


TA(TA.  1(X))  = AA-1x  = /x  = X 
TA~1  (Ta(x))  = = lx  = x 

or,  equivalently, 

T*°TA-'  =taa-'=ti 
TA-'°Ta  =ta-'a=t< 

From  a more  geometric  viewpoint,  if  w is  the  image  of  x under  7^,  then  ^-1  maps  w back  into  since 

V1(W)  = V1(MX))=X 

(Figure  4.10.8). 


Figure  4.10.8 


Before  considering  examples,  it  will  be  helpful  to  touch  on  some  notational  matters.  If  Tj±.  Rn  — » Rn  is  a 
one-to-one  matrix  operator,  and  if  7^_i  \Rn  — ► A"”  is  its  inverse,  then  the  standard  matrices  for  these  operators 
are  related  by  the  equation 


TA-l  = TAl  (6) 

In  cases  where  it  is  preferable  not  to  assign  a name  to  the  matrix,  we  will  write  this  equation  as 

[7-‘]  = m-‘  (7) 


EXAMPLE  7 Standard  Matrix  for  T 1 


Let  7*  p}  ^ be  the  operator  that  rotates  each  vector  in  p1  through  the  angle  9,  so  from  Table  5 
of  Section  4.9  , 


T 


cos  0 —sin  9 
sin  0 cos  6 


(8) 


It  is  evident  geometrically  that  to  undo  the  effect  of  T,  one  must  rotate  each  vector  in  f’2  through 
the  angle  —9-  But  this  is  exactly  what  the  operator  does,  since  the  standard  matrix  for  J-1  is 

cos  ( — 0)  — sin  (—  9) 
sin  ( — 6)  cos  (—0) 


[T~l]  = [T]~l  = 


cos  0 sin  0 
—sin  9 cos  6 


(verify),  which  is  the  standard  matrix  for  a rotation  through  the  angle  — Q. 


EXAMPLE  8 Finding  T 1 

Show  that  the  operator  X:  R?  — ► R?  defined  by  the  equations 

w\  =2x\+X2 

= 3x\  +4x2 

is  one-to-one,  and  find  (wi,  m?2  Y 


The  matrix  form  of  these  equations  is 

w\ 

W2 


2 
3 4 


:][i; 


so  the  standard  matrix  for  T is 


2 1 
3 4 


This  matrix  is  invertible  (so  T is  one-to-one)  and  the  standard  matrix  for  7*  1 is 


4 _i 

5 5 

1 2 
■5  5 


Thus 


I T~l 

~W\~ 

L J 

W2 

4 _1 

5 5 

3 2 
■5  5 


W2 


5wi--w2 


4 

5 

3 2 

-wi  +-w2 


from  which  we  conclude  that 


T 1 | 


wi  + 


Linearity  Properties 

Up  to  now  we  have  focused  exclusively  on  matrix  transformations  from  Rn  to  Rm.  However,  these  are  not  the 
only  kinds  of  transformations  from  Rn  to  Rm.  For  example,  if  / 1,  f 2*  • fm  are  anY  functions  of  the  n 
variables  *i,X2,---,  xn?  then  the  equations 

= /lOl>*2,— 

™2  = /2(*l>*2>— 

= /roOl>*2>  — 

define  a transformation  — ► Rm  that  maps  the  vector  x = (x\,  X2,  - xn)  into  the  vector  (w\,  W2, vi^). 

But  it  is  only  in  the  case  where  these  equations  are  linear  that  T is  a matrix  transformation.  The  question  that  we 


will  now  consider  is  this: 


1 


Question 

Are  there  algebraic  properties  of  a transformation  T.Rn  — ► Rm  that  can  be  used  to  determine  whether  T is 
a matrix  transformation? 


J 


The  answer  is  provided  by  the  following  theorem. 


THEOREM  4.10.2 

T.R™  Rm  is  a matrix  transformation  if  and  only  if  the  following  relationships  hold  for  all  vectors  u 
and  v in  Rn  and  for  every  scalar  k\ 

(i)  T(n  + v)  = T(  (u)  4-  T( v) ) [ Additivity  prop  erty  ] 

(H)  T(ku)  = kT( u)  [Homogeneity  property] 

If  T is  a matrix  transformation,  then  properties  (i)  and  (ii)  follow  respectively  from  parts  (c)  and  ( b ) of 
Theorem  4.9.1. 

Conversely,  assume  that  properties  (i)  and  (ii)  hold.  We  must  show  that  there  exists  an  m x n matrix  A such  that 

T(x)  = Ax 

for  every  vector  x in  R As  a first  step,  recall  from  Formula  (10)  of  Section  4.9  that  the  additivity  and 
homogeneity  properties  imply  that 

T(k\u\  4-  &2U2  4-  • • • +krur)=k\T(\ii)+k2T(xi2)+  ■ • ■ +krT(ur)  (9) 

for  all  scalars  k\ , k2,  - - .,kr  and  all  vectors  ui , U2, . . nr  in  Rn.  Let  A be  the  matrix 

A=[T(e{)\T(e2)\-  - • |T(e„)] 

in  which  e2, eM  are  the  standard  basis  vectors  for 

It  follows  from  Theorem  1.3.1  that  Ax  is  a linear  combination  of  the  columns  of  A in  which  the  successive 
coefficients  are  the  entries  x\,  X2,  - of  x.  That  is, 

Ax  = x\T(e\)  +X2T(e2)  + • • • +x„T(e„) 

Using  9 we  can  rewrite  this  as 

Ax  = T(x\e\  +*2*2  + " “ ’ +xnen)  = T(x) 

which  completes  the  proof. 

The  additivity  and  homogeneity  properties  in  Theorem  4.10.2  are  called  linearity  conditions , and  a 
transformation  that  satisfies  these  conditions  is  called  a linear  transformation.  Using  this  terminology  Theorem 


4.10.2  can  be  restated  as  follows. 


THEOREM  4.10.3 

Every  linear  transformation  from  R*1  to  Rm  is  a matrix  transformation,  and  conversely,  every  matrix 
transformation  from  Rn  to  Rm  is  a linear  transformation. 


More  on  the  Equivalence  Theorem 

As  our  final  result  in  this  section,  we  will  add  parts  ( b ) and  (c)  of  Theorem  4.10.1  to  Theorem  4.8.10. 


Equivalent  Statements 

If  A is  an  n x n matrix,  then  the  following  statements  are  equivalent. 

(a)  A is  invertible. 

(b)  Ax  = 0 has  only  the  trivial  solution. 

(c)  The  reduced  row  echelon  form  of  A is  I n. 

(d)  A is  expressible  as  a product  of  elementary  matrices. 

(e)  Ax  = b is  consistent  for  every  n x 1 matrix  b- 

(f)  Ax  = b has  exactly  one  solution  for  every  n x 1 matrix  b- 

(g)  det(^4)  * 0. 

(h)  The  column  vectors  of  A are  linearly  independent. 

(i)  The  row  vectors  of  A are  linearly  independent. 

(j)  The  column  vectors  of  A span  R™. 

(k)  The  row  vectors  of  A span  Rn. 

(l)  The  column  vectors  of  A form  a basis  for  Rn. 

(m)  The  row  vectors  of  A form  a basis  for  Rn. 

(n)  A has  rank  n. 

(o)  A has  nullity  Q. 

(p)  The  orthogonal  complement  of  the  null  space  of  A is  Rn. 

(q)  The  orthogonal  complement  of  the  row  space  of  A is  { 0 } . 

(r)  The  range  of  T is  R n. 

(s)  T j\  is  one-to-one. 


Concept  Review 

Composition  of  matrix  transformations 
Reflection  about  the  origin 
One-to-one  transformation 
Inverse  of  a matrix  operator 
Linearity  conditions 
Linear  transformation 

Equivalent  characterizations  of  invertible  matrices 

Skills 

Find  the  standard  matrix  for  a composition  of  matrix  transformations. 

Determine  whether  a matrix  operator  is  one-to-one;  if  it  is,  then  find  the  inverse  operator. 
Determine  whether  a transformation  is  a linear  transformation. 


Exercise  Set  4.10 

In  Exercises  1-2,  let  T and  Tg  be  the  operators  whose  standard  matrices  are  given.  Find  the  standard  matrices 
for  and  T^o  Tg  . 


1. 

"l 

-2 

O' 

'2 

-3 

3' 

A = 

4 

1 

-3 

. B = 

5 

0 

1 

5 

2 

4 

6 

1 

7 

Answer: 


5 

-1 

21" 

'-8 

-3 

f 

10 

-8 

4 

II 

E? 

0 

-5 

1 

00 

45 

3 

25 

44 

-11 

45 

2. 

'6 

3 

-f 

4 

0 

4' 

A- 

2 

0 

1 

, B = 

-1 

5 

2 

4 

-3 

6 

2 

-3 

8 

3.  Let  T\  (xi,  *2)  = Oi  +*2,  *1  -*2)  and  Tjixx,  *2)  = (3*i,  2x\  +4*2)  • 

(a)  Find  the  standard  matrices  for  T \ and  7*2  • 

(b)  Find  the  standard  matrices  for  7*2  o T\  and  T\  o 7*2 

(c)  Use  the  matrices  obtained  in  part  (b)  to  find  formulas  for  T\  (7*2 (*i,  *2))  and  72(7*1  (*l, *2))  • 


Answer: 


(c)  T2(T\(x\,  x2))  = (3xi  +3x2,  6xi  -2x2), 


Ti(T2(xu  x2))  = (5xi+4x2,  x\~4x2) 

4.  LetTi(xi,x2,X3)  = (4x\,  -2x\+x2,  - x\ -3x2)  andT2(x\,  x2,  X3)  = (xi + 2x2,  - X3,  4x\ -X3). 

(a)  Find  the  standard  matrices  for  T \ and  7*2  • 

(b)  Find  the  standard  matrices  for  7*2  o T\  and  T\  0X2- 

(c)  Use  the  matrices  obtained  in  part  (b)  to  find  formulas  for  7*1  (7*2  (*i,  *3))  and  7*2  (7^  (*1>  x2>  *3))- 

5.  Find  the  standard  matrix  for  the  stated  composition  in  p}. 


(a)  A rotation  of  90°,  followed  by  a reflection  about  the  line  y = x. 


(b) 


An  orthogonal  projection  on  the  y-axis,  followed  by  a contraction  with  factor  k = 


I 

2* 


(c)  A reflection  about  the  x-axis,  followed  by  a dilation  with  factor  k = 3- 


Answer: 

(a)  [1 

_°  -1_ 

(b)  fo  O' 

:0  f 

(c)  3 0 

.0  — 3_ 

6.  Find  the  standard  matrix  for  the  stated  composition  in  g}. 

(a)  A rotation  of  60°,  followed  by  an  orthogonal  projection  on  the  x-axis,  followed  by  a reflection  about  the 
line  y = x. 

(b)  A dilation  with  factor  k = 2?  followed  by  a rotation  of  45°,  followed  by  a reflection  about  the  y-axis. 

(c)  A rotation  of  15°,  followed  by  a rotation  of  105°,  followed  by  a rotation  of  60°. 

7.  Find  the  standard  matrix  for  the  stated  composition  in 

(a)  A reflection  about  the  yz-plane,  followed  by  an  orthogonal  projection  on  the  xz-plane. 

(b)  A rotation  of  45°  about  the  y-axis,  followed  by  a dilation  with  factor  k = ^2- 

(c)  An  orthogonal  projection  on  the  xy-plane,  followed  by  a reflection  about  the  yz-plane. 


Answer: 

(a)  — 1 0 0 

0 0 0 
0 0 1 


(b) 


1 0 1 
0 ft  0 
-1  0 1 


(c) 


-10  0 
0 1 0 

0 0 0 


8.  Find  the  standard  matrix  for  the  stated  composition  in  p^. 

(a)  A rotation  of  30°  about  the  x-axis,  followed  by  a rotation  of  30°  about  the  z-axis,  followed  by  a 
contraction  with  factor  k = \. 

4 

(b)  A reflection  about  the  xy-plane,  followed  by  a reflection  about  the  xz-plane,  followed  by  an  orthogonal 
projection  on  theyz-plane. 

(c)  A rotation  of  270°  about  the  x-axis,  followed  by  a rotation  of  90°  about  the  y-axis,  followed  by  a rotation 
of  180°  about  the  z-axis. 


9.  Determine  whether  T\  oT2  = T2oT\. 

(a)  R?  — ► R " *s  the  orthogonal  projection  on  the  x-axis,  and  Ti  .R?  R*  is  the  orthogonal  projection  on 
the  y-axis. 

(b)  T\  R?  — ♦ R^  is  the  rotation  through  an  angle  9\ , and  p}  is  the  rotation  through  an  angle  #2* 

(c)  T\.R2  —>R2  is  the  orthogonal  projection  on  the  x-axis,  and  T^  .p}  R?  is  the  rotation  through  an  angle 
0. 


Answer: 

(a)  Tl  oT2  = T2oTi 

(b)  T\  o7,2  = T2oT1 

(c)  7-!  o r2  * r2  o Tl 

10.  Determine  whether  T\oT2  = T2oT\. 

(a)  T\ : R?  — ► R?  *s  a dilation  by  a factor  k , and  7.-,  7 3 7-'  is  the  rotation  about  the  z-axis  through  an  angle 

0. 

(b)  T\  .R?  R?  is  the  rotation  about  the  x-axis  through  an  angle  9\ , and  Ti  .R?  —>  R?  is  the  rotation  about 
the  z-axis  through  an  angle  #2* 

11.  By  inspection,  determine  whether  the  matrix  operator  is  one-to-one. 

(a)  the  orthogonal  projection  on  the  x-axis  in  p} 

(b)  the  reflection  about  the  y-axis  in  p 2 

(c)  the  reflection  about  the  line  y = x in  7 2 

(d)  a contraction  with  factor  fc  > Q in  p^ 

(e)  a rotation  about  the  z-axis  in  p-' 

(f)  a reflection  about  the  xy-plane  in 

(g)  a dilation  with  factor  £ > Q i n 


Answer: 


(a)  Not  one-to-one 

(b)  One-to-one 

(c)  One-to-one 

(d)  One-to-one 

(e)  One-to-one 

(f)  One-to-one 

(g)  One-to-one 

12.  Find  the  standard  matrix  for  the  matrix  operator  defined  by  the  equations,  and  use  Theorem  4.10.4  to 

determine  whether  the  operator  is  one-to-one. 

(a)  wi  = 8*i  +4x2 
m>2  = 2x\+  X2 

(b)  w\  =2xi  -3*2 
W2  = 5*i  + X2 

(c)  wi  = — *i  + 3^2  + 2x3 

m>2  = 2x\  +4x3 

w3  = xi + 3x2 + 6x3 

(d)  wi  = xi  + 2x2 + 3x3 

vi>2  = 2xi  + 5*2  + 3*3 
W3  = xi  +8x3 

13.  Determine  whether  the  matrix  operator  T.R}  — ► R defined  by  the  equations  is  one-to-one;  if  so,  find  the 

standard  matrix  for  the  inverse  operator,  and  find  jwj,  wj  j. 

(a)  wi  = ^i  + 2x2 
v>2  = -xi  + X2 

(b)  wi  = 4xi -6x2 
v>2  = -2xi  + 3x2 

(c)  wl=  -*2 

W2=  -*l 

(d)  wi  = 3xi 
W2=  -5xi 


(d)  Not  one-to-one 


14.  Determine  whether  the  matrix  operator  TR-'  — ► R?  defined  by  the  equations  is  one-to-one;  if  so,  find  the 
standard  matrix  for  the  inverse  operator,  and  find  T 1 iw\,  vi?2,  W3  J. 

(a)  wi  = *1  “2x2 + 2x3 
W2  = 2xi  + *2  + *3 
m?3=  xi  + X2 

(b)  wi  = *1  -3x2 +4x3 

m?2  = “Xi  + X2  + X3 

= — 2x2  + 5x3 

(c)  wi  = xi  +4x2  -x3 
m?2  = 2xi  + 7x2  + *3 

= xi  + 3x2 

(d)  wi  = xi + 2x2+  *3 
vi?2  = — 2xi  + ^2  + 4x3 

W3  = 7xi +4x2 —5x3 


15.  By  inspection,  find  the  inverse  of  the  given  one-to-one  matrix  operator. 

(a)  The  reflection  about  the  x-axis  in  g? 

(b)  The  rotation  through  an  angle  of  ^ j 4 in  p}. 

(c)  The  dilation  by  a factor  of  3 in  ft}. 

(d)  The  reflection  about  the  yz-plane  in 

(e)  The  contraction  by  a factor  of  I in/?3. 


Answer: 


(a)  Reflection  about  the  x-axis 

(b)  Rotation  through  the  angle  — ^ 

(c)  Contraction  by  a factor  of  ~ 


(d)  Reflection  about  the  yz-plane 

(e)  Dilation  by  a factor  of  5 


In  Exercises  16 — 17,  use  Theorem  4.10.2  to  determine  whether  T.R 2 — » r}  is  a matrix  operator. 


16-(a)  T(x,y)  = (2x,y) 

(b)  r(x^)=(x2,7) 

(c)  T(x,y)  = (-y,x) 

(d)  T(x,y)  = (x,  0) 

17 • (a)  T(x,y)  = (2x  +y,x-y) 
(b)  T(x,y)  = (x+  l,^) 


(c)  T(x,y)  = (y,y) 

(d)  Tlx.y ) = 

Answer: 

(a)  Matrix  operator 

(b)  Not  a matrix  operator 

(c)  Matrix  operator 

(d)  Not  a matrix  operator 

In  Exercises  18-19,  use  Theorem  4.10.2  to  determine  whether  T.R?  — ► F?  is  a matrix  transformation. 

18-(a)  T(x,y,z)  = (x,  x+y+z) 

(b)  T(x,y,z)  = ( 1,1) 

19'  (a)  T{x,y,z)  = { 0,0) 

(b)  T(x,  y,  z ) = (3x  - Ay,  2x  - 5 z) 

Answer: 

(a)  Matrix  transformation 

(b)  Matrix  transformation 

20.  In  each  part,  use  Theorem  4.10.3  to  find  the  standard  matrix  for  the  matrix  operator  from  the  images  of  the 
standard  basis  vectors. 

(a)  The  reflection  operators  on  p1  in  Table  1 of  Section  4.9  . 

(b)  The  reflection  operators  on  p-'  in  Table  2 of  Section  4.9  . 

(c)  The  projection  operators  on  p2  in  Table  3 of  Section  4.9  . 

(d)  The  projection  operators  on  p -■  in  Table  4 of  Section  4.9  . 

(e)  The  rotation  operators  on  p2  in  Table  5 of  Section  4.9  . 

(f)  The  dilation  and  contraction  operators  on  £-:  in  Table  8 of  Section  4.9  . 

21.  Find  the  standard  matrix  for  the  given  matrix  operator. 

(a)  T:  R2  — » projects  a vector  orthogonally  onto  the  x-axis  and  then  reflects  that  vector  about  the  y-axis. 

(b)  T.R1  —+  R1  reflects  a vector  about  the  line  y = x and  then  reflects  that  vector  about  the  x-axis. 

(c)  T.R*  —*  R"  dilates  a vector  by  a factor  of  3,  then  reflects  that  vector  about  the  line  y = x,  and  then 
projects  that  vector  orthogonally  onto  they-axis. 

Answer: 

-1  0" 

0 0 


(a) 


(b) 

0 1 

-1  0 

(c) 

0 0" 

3 0 

22.  Find  the  standard  matrix  for  the  given  matrix  operator. 


(a)  T.B?  —*  R~'  reflects  a vector  about  the  xz-plane  and  then  contracts  that  vector  by  a factor  of  y 

(b)  T:  R — ► R Jl  projects  a vector  orthogonally  onto  the  xz-plane  and  then  projects  that  vector  orthogonally 
onto  the  xy-plane. 

(c)  X:  R — ♦ R^  reflects  a vector  about  the  xy-plane,  then  reflects  that  vector  about  the  xz-plane,  and  then 
reflects  that  vector  about  the  yz-plane. 


23.  Let  xA.  R?  — ► R~'  be  multiplication  by 


A = 


— 13  0 

2 1 2 
4 5-3 


and  let  e2,  and  e3  be  the  standard  basis  vectors  for  R-'.  Find  the  following  vectors  by  inspection. 


(a)  TA(ei),TA(e2),™dTA(e3) 

(b)  TjiOi  + e2+e3) 

(c)  TA(7ej) 


Answer: 

(a)  TAM  = ( - 1.  2.  4),  Ta(b 2)  = (3,  1.  5),  TA(e3)  = (0,  2,  - 3) 

(b)  Ta(bi  + e2  + e3)  = (2,  5,  6) 

(c)  Ta(1b3)  = (0,  14,  -21) 

24.  Determine  whether  multiplication  by  A is  a one-to-one  matrix  transformation. 


(a) 

1 -1 

A = 

2 0 

i 

i 

CO 

1 

(Va= 

1 2 

3 

-1  0 

-4 

(c) 

1 2 

f 

A 

0 1 

1 

Si  — 

1 1 

0 

1 0 - 

-1 

(a)  Is  a composition  of  one-to-one  matrix  transformations  one-to-one?  Justify  your  conclusion. 

(b)  Can  the  composition  of  a one-to-one  matrix  transformation  and  a matrix  transformation  that  is  not 
one-to-one  be  one-to-one?  Account  for  both  possible  orders  of  composition  and  justify  your  conclusion. 


Answer: 


(a)  Yes 

(b)  Yes 

26.  Show  that  T(x,  y)  = (0,  0)  defines  a matrix  operator  on  p}  but  T(x,  y)  = { 1,1)  does  not. 

(a)  Prove:  If  T\Rn  — ► Rm  is  a matrix  transformation,  then  T(0)  = 0;  that  is,  T maps  the  zero  vector  in  Rn 
into  the  zero  vector  in  Rm. 

(b)  The  converse  of  this  is  not  true.  Find  an  example  of  a function  that  satisfies  T(0)  = 0 but  is  not  a matrix 
transformation. 

Answer: 

(b)  T(x i,  X2)  = (xj  + X2,  *1*2) 

28.  Prove:  An  nxn  matrix  A is  invertible  if  and  only  if  the  linear  system  Ax  = w has  exactly  one  solution  for 
every  vector  w in  Rn  for  which  the  system  is  consistent. 

29.  Let  A be  an  n x n matrix  such  that  det(A)  = 0,  and  let  T:  Rn  — ♦ Rn  be  multiplication  by  A. 

(a)  What  can  you  say  about  the  range  of  the  matrix  T?  Give  an  example  that  illustrates  your  conclusion. 

(b)  What  can  you  say  about  the  number  of  vectors  that  T maps  into  0? 

Answer: 

(a)  The  range  of  T is  a proper  subset  of  R 

(b)  T must  map  infinitely  many  vectors  to  0. 

30.  Prove:  If  the  matrix  transformation  Tj(.Rn  — ► Rn  is  one-to-one,  then  A is  invertible. 

True-False  Exercises 

In  parts  (a)-(f)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  If  T:  Rn  — ► Rm  and  T(0)  = 0,  then  T is  a matrix  transformation. 

Answer: 

False 

(b)  If  T\Rn  — ¥ Rm  and  T(c\x  + C2y)  =c\T(x)  + C2T(y)  for  all  scalars  c\  and  ^2  and  all  vectors  x and  y in  R* 
, then  T is  a matrix  transformation. 

Answer: 

True 

(c)  If  T.R”^Rm  is  a one-to-one  matrix  transformation,  then  there  are  no  distinct  vectors  x and  y for  which 
T(x  —y)  = 0. 

Answer: 


True 


(d)  If  T.  Rn  — ► Rm  is  a matrix  transformation  and  m>n,  then  T is  one-to-one. 


Answer: 

False 

(e)  If  T.  Rn  — ► Rm  is  a matrix  transformation  and  m — then  T is  one-to-one. 
Answer: 

False 

(f)  If  T.  R”  — ► Rm  is  a matrix  transformation  and  m<n,  then  T is  one-to-one. 
Answer: 

False 
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4.11  Geometry  of  Matrix  Operators  on  r2 

In  this  optional  section  we  will  discuss  matrix  operators  on  pz  in  a little  more  depth.  The  ideas  that  we  will  develop  here 
have  important  applications  to  computer  graphics. 


Transformations  of  Regions 

In  Section  4.9  we  focused  on  the  effect  that  a matrix  operator  has  on  individual  vectors  in  p}  and  p~'.  However,  it  is  also 
important  to  understand  how  such  operators  affect  the  shapes  of  regions.  For  example,  Figure  4.11.1  shows  a famous 
picture  of  Albert  Einstein  and  three  computer-generated  modifications  of  that  image  that  result  from  matrix  operators  on 
p}.  The  original  picture  was  scanned  and  then  digitized  to  decompose  it  into  a rectangular  array  of  pixels.  The  pixels 
were  then  transformed  as  follows: 

The  program  MATLAB  was  used  to  assign  coordinates  and  a gray  level  to  each  pixel. 

The  coordinates  of  the  pixels  were  transformed  by  matrix  multiplication. 

The  pixels  were  then  assigned  their  original  gray  levels  to  produce  the  transformed  picture. 


Figure  4.11.1 


The  overall  effect  of  a matrix  operator  on  p}  can  often  be  ascertained  by  graphing  the  images  of  the  vertices 
(0,  0),  (1,  0),  (0,  1),  and  (1,  1)  of  the  unit  square  (Figure  4.11.2).  Table  1 shows  the  effect  that  some  of  the  matrix 
operators  studied  in  Section  4.9  have  on  the  unit  square.  For  clarity,  we  have  shaded  a portion  of  the  original  square  and 
its  corresponding  image. 


i\y 

(i.  i) 


X 


Unit  square 


x 

-► 


Unit  square  reflected 
about  the  y-ax is 


/ 


/ 


/ 

/ 


Unit  square  reflected 
about  the  line  y = x 


Unit  square 
onto  the.v-< 


Figure  4.11.2 


Table  1 


Operator 

Standard  Matrix 

Effect  on  the  Unit  Square 

fv 

r 

(i.i) 

h.  i) 

Reflection  about 

f-1  o] 

i 

i 

the  y-axis 

L»  ,J 

X 

— , ► 

X 

— , ► 

(1. 1) 


Reflection  about 
the  x-axis 


[: .:] 


X 

-► 


(1,-D 


Reflection  about 
the  line  y - x 


[: :] 


(1,1) 


(1.  1) 


.r 

-► 


"I 


Counterclockwise 
rotation  through 
an  angle  0 


[cos  9 -sin 9 ~| 
sin  9 cos  Oj 


(l.l) 


(cosO-  sinO,  sinO  + cost) ) 

\y\ 


_iIL 


Compression  in  the 
.v-direction  by  a 
factor  of  k 

(0  <k<  1) 


[::] 


(1.1) 


(*,  1) 


Expansion  in  the 
^-direction  by  a 
factor  of  k 

(k>  1) 


[::] 


(1,1) 


{ k . 1) 


Shear  in  the 
,r-direction  with 
factor  k > 0 


[:;] 


r « 


i y 


i) 


X 

-► 


(x  + At,  v) 


Shear  in  the 


r,  *i 


d.i) 


(*  + ky.  y) 


EXAMPLE  1 Transforming  with  Diagonal  Matrices 


Suppose  that  the  xy-plane  first  is  compressed  or  expanded  by  a factor  of  in  the  x-direction  and  then  is 
compressed  or  expanded  by  a factor  of  ^ in  the  y-direction.  Find  a single  matrix  operator  that  performs 
both  operations. 


The  standard  matrices  for  the  two  operations  are 


>1  o' 

'1  0 ' 

.0  1. 

_°  k2_ 

x-  c ompr e s sion  (exp  ansion)  y - c ompre  s sion  (exp  ansion) 

Thus,  the  standard  matrix  for  the  composition  of  the  x-operation  followed  by  the  y-operation  is 

A = 


'1  0 ' 

'*!  O' 

'*1  0 ' 

0 k2 

_°  1. 

0 k2 

(1) 


This  shows  that  multiplication  by  a diagonal  2x2  matrix  compresses  or  expands  the  plane  in  the 
x-direction  and  also  in  the  y-direction.  In  the  special  case  where  and  k2  are  the  same,  say  k\  = £2  = k. 
Formula  1 simplifies  to 

O' 


A = 


0 k 


which  is  a contraction  or  a dilation  (Table  7 of  Section  4.9  ). 


EXAMPLE  2 Finding  Matrix  Operators 

Find  the  standard  matrix  for  the  operator  on  that  first  shears  by  a factor  of  2 in  the  x-direction  and 
then  reflects  the  result  about  the  line  y = x.  Sketch  the  image  of  the  unit  square  under  this  operator. 
Find  the  standard  matrix  for  the  operator  on  that  first  reflects  about  y = x and  then  shears  by  a 
factor  of  2 in  the  x-direction.  Sketch  the  image  of  the  unit  square  under  this  operator. 

Confirm  that  the  shear  and  the  reflection  in  parts  (a)  and  (b)  do  not  commute. 


Solution 


The  standard  matrix  for  the  shear  is 


and  for  the  reflection  is 


2 

1 


1 

0 


Thus,  the  standard  matrix  for  the  shear  followed  by  the  reflection  is 


A2A 1 = 


'0  f 

'1  2' 

1 

o 

1 

o 

_°  1. 

_1  2_ 

(b)  The  standard  matrix  for  the  reflection  followed  by  the  shear  is 


A\A  2 = 


CM 

1 

o 

i 

1 

T— ' 

o 

1 

o 

o 

The  computations  in  Solutions  (a)  and  ( b ) show  that  A\A2  * A2A1,  so  the  standard  matrices,  and 
hence  the  operators,  do  not  commute.  The  same  conclusion  follows  from  Figures  4.11.3  and  4.11.4, 
since  the  two  operators  produce  different  images  of  the  unit  square. 
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Reflection 
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Figure  4.11.3 
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\y  y=x 

/ 

/ 

/ 

/ 

/ (3. 1) 

/ 

/ 

Jt 

/ 

/ 

/ * 

7 

/ 

/ 

Shear  in  the 
v-direction 
with  k = 2 


(1,3)  y/x 


/ 


/ 


Reflection 
about  y = x 


Figure  4.11.4 


Geometry  of  One-to-One  Matrix  Operators 

We  will  now  turn  our  attention  to  one-to-one  matrix  operators  on  g},  which  are  important  because  they  map  distinct 
points  into  distinct  points.  Recall  from  Theorem  4.10.4  (the  Equivalence  Theorem)  that  a matrix  transformation  T is 
one-to-one  if  and  only  if  A can  be  expressed  as  a product  of  elementary  matrices.  Thus,  we  can  analyze  the  effect  of  any 
one-to-one  transformation  T j\  by  first  factoring  the  matrix  A into  a product  of  elementary  matrices,  say 

A = E\E2..-Er 

and  then  expressing  T as  the  composition 

TA=TBiS-2r.£r=TBi°TE20  — °TBr  (2) 


The  following  theorem  explains  the  geometric  effect  of  matrix  operators  corresponding  to  elementary  matrices. 


THEOREM  4.11.1 


If  E is  an  elementary  matrtix,  then  Tg.B?  — ► F?  is  one  °f  the  following: 

(a)  A shear  along  a coordinate  axis. 

(b)  A reflection  about  y = x. 

(c)  A compression  along  a coordinate  axis. 

(d)  An  expansion  along  a coordinate  axis. 

(e)  A reflection  about  a coordinate  axis. 

(f)  A compression  or  expansion  along  a coordinate  axis  followed  by  a reflection  about  a coordinate  axis. 


Because  a 2 x 2 elementary  matrix  results  from  performing  a single  elementary  row  operation  on  the  2x2 
identity  matrix,  such  a matrix  must  have  one  of  the  following  forms  (verify): 


1 

o 

L k 1_ 

? 

_°  i_ 

0 1]  0" 

i oj’  [o  lj’ 


1 0 
0 k 


The  first  two  matrices  represent  shears  along  coordinate  axes,  and  the  third  represents  a reflection  about  y = x.  If  k > 0? 
the  last  two  matrices  represent  compressions  or  expansions  along  coordinate  axes,  depending  on  whether  0 < k < 1 or 
k > 1 • If  k < Q?  and  if  we  express  k in  the  form  k = —k  where  > 0,  then  the  last  two  matrices  can  be  written  as 


'k  O' 

-*i 

O' 

'-1  o' 

■*1  o' 

.o  r 

0 

1_ 

0 1_ 

.0  1 

'1  o' 

'1  0 

'1  O' 

'1  0 ' 

_0  k_ 

0 -*i 

.0  -1 

0 *1 

Since  > 0,  the  product  in  3 represents  a compression  or  expansion  along  the  x-axis  followed  by  a reflection  about  the 
y- axis,  and  4 represents  a compression  or  expansion  along  the  j-axis  followed  by  a reflection  about  the  x-axis.  In  the 
case  where  k=  — 1 , transformations  3 and  4 are  simply  reflections  about  the  y-axis  and  x-axis,  respectively. 


Since  every  invertible  matrix  is  a product  of  elementary  matrices,  the  following  result  follows  from  Theorem  4.11.1  and 
Formula  2. 


THEOREM  4.11.2 

If  7^  R*  is  multiplication  by  an  invertible  matrix  A,  then  the  geometric  effect  of  T is  the  same  as  an 
appropriate  succession  of  shears,  compressions,  expansions,  and  reflections. 


EXAMPLE  3 Analyzing  the  Geometric  Effect  of  a Matrix  Operator 


Assuming  that  k\  and  kj  are  positive,  express  the  diagonal  matrix 


A = 


~k  i 0 " 

0 k2 


as  a product  of  elementary  matrices,  and  describe  the  geometric  effect  of  multiplication  by  A in  terms  of 
compressions  and  expansions. 


From  Example  1 we  have 

A = 


which  shows  that  multiplication  by  A has  the  geometric  effect  of  compressing  or  expanding  by  a factor  of 
k\  in  the  x-direction  and  then  compressing  or  expanding  by  a factor  of  k2  in  the  j-direction. 


*1 

0 " 

"l 

o ' 

■*1  O' 

0 

h 

0 

k2_ 

_0  1 

EXAMPLE  4 Analyzing  the  Geometric  Effect  of  a Matrix  Operator 


Express 


as  a product  of  elementary  matrices,  and  then  describe  the  geometric  effect  of  multiplication  by  A in  terms 
of  shears,  compressions,  expansions,  and  reflections. 


olutior  A can  be  reduced  to  / as  follows: 


"1  2' 

"1  2 ' 

"1  2' 

1 

o 

_3  4 

0 

1 

DO 

1 

O 

1 

o 

T t t 


Add—3  times 
the  first  row 
to  the  second. 


Multiply  the 
second  row 


Add  “2  times 
the  second  row 
to  the  first. 


The  three  successive  row  operations  can  be  performed  by  multiplying  A on  the  left  successively  by 

1 0" 


E i = 


1 

-3 


E2  = 


0 4 


E 3 = 


1 -2 

0 1 


Inverting  these  matrices  and  using  Formula  4 of  Section  1.5  yields 


A = E^E2XE^ 


1 

o 

1 

o 

CM 
i 

1 

CO 

_0  — 2_ 

1 

O 

Reading  from  right  to  left  and  noting  that 


"1  O' 

1 

o 

1 

o 

1 o 
[ 

CO 

o 

1 

CM  ' 
O 

it  follows  that  the  effect  of  multiplying  by  A is  equivalent  to 
shearing  by  a factor  of  2 in  the  x-direction, 
then  expanding  by  a factor  of  2 in  the  j-direction, 
then  reflecting  about  the  x-axis, 
then  shearing  by  a factor  of  3 in  the  y-direction. 


Images  of  Lines  Under  Matrix  Operators 


Many  images  in  computer  graphics  are  constructed  by  connecting  points  with  line  segments.  The  following  theorem, 
some  of  whose  parts  are  proved  in  the  exercises,  is  helpful  for  understanding  how  matrix  operators  transform  such 
figures. 


THEOREM  4.11.3 

If  T.  F?  — ► is  multiplication  by  an  invertible  matrix,  then: 

(a)  The  image  of  a straight  line  is  a straight  line. 

(b)  The  image  of  a straight  line  through  the  origin  is  a straight  line  through  the  origin. 

(c)  The  images  of  parallel  straight  lines  are  parallel  straight  lines. 

(d)  The  image  of  the  line  segment  joining  points  P and  Q is  the  line  segment  joining  the  images  of  P and  Q. 

(e)  The  images  of  three  points  lie  on  a line  if  and  only  if  the  points  themselves  lie  on  a line. 


Note  that  it  follows  from  Theorem  4.11.3  that  if  A is 
an  invertible  2x2  matrix,  then  multiplication  by  A 
maps  triangles  into  triangles  and  parallelograms  into 
parallelograms. 


EXAMPLES  Image  of  a Square 

Sketch  the  image  of  the  square  with  vertices  (0,0),(1,  1),  and  (0,  1)  under  multiplication  by 


Solution  Since 


"-1  2' 

‘O' 

"O' 

"-1  2 

T 

'-r 

2 -1. 

_0_ 

_0_ 

7 

2 -1. 

_o_ 

2_ 

'-1  2' 

'o' 

2' 

'-1  2' 

V 

T 

2 -1. 

_1_ 

-1 

7 

2 “I. 

_1_ 

_1_ 

the  image  of  the  square  is  a parallelogram  with  vertices  (0,  0),  ( — 1,  2),  (2,  — 1),  and  (1,1)  (Figure 
4.11.5). 
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Figure  4.11.5 


EXAMPLE  6 ImageofaLine 

According  to  Theorem  4. 1 1 .3,  the  invertible  matrix 

A = 


3 1 
2 1 


maps  the  line  y = 2x  + 1 into  another  line.  Find  its  equation. 

Let  (x,  y)  be  a point  on  the  line  y — 2x  + 1,  and  let  (x  ' ,y  1 ) be  its  image  under 
multiplication  by  A.  Then 


V ' 

'3  r 

~ X " 

~ X " 

'3  r 

-1 

'x ' ' 

1 -r 

V " 

y' 

2 i_ 

y 

and 

7 

2 1 

7' 

-2  3_ 

7' 

so 

x = x ' — y ' 

y = —2x ' + 3y ' 

Substituting  in  y = 2x  + 1 yields 

' 4=  3^  ' = 2 ' — 7 ' j 4=  1 or  equivalently  y ' = ^x  1 + 

Thus  (x  1 , 7 1 ) satisfies 


r = ?+5 


which  is  the  equation  we  want. 


Concept  Review 

Effect  of  a matrix  operator  on  the  unit  square 
Geometry  of  one-to-one  matrix  operators 
Images  of  lines  under  matrix  operators 

Skills 

Find  standard  matrices  for  geometric  transformations  of 
Describe  the  geometric  effect  of  an  invertible  matrix  operator. 
Find  the  image  of  the  unit  square  under  a matrix  operator. 
Find  the  image  of  a line  under  a matrix  operator. 


Exercise  Set  4.11 

1.  Find  the  standard  matrix  for  the  operator  T.R?  — » R1  that  maps  a point  (*,  y ) into 

(a)  its  reflection  about  the  line  y = — x. 

(b)  its  reflection  through  the  origin. 

(c)  its  orthogonal  projection  on  the  x-axis. 

(d)  its  orthogonal  projection  on  they-axis. 


Answer: 


(a) 


(b) 


(c) 


(d) 


0 -1 

-1  0 

-1  0 

0 -1 

1 0 

0 0 
0 0 
0 1 


2.  For  each  part  of  Exercise  1,  use  the  matrix  you  have  obtained  to  compute  T{2,  1).  Check  your  answers  geometrically 
by  plotting  the  points  (2,  1)  and  T{2,  1). 

3.  Find  the  standard  matrix  for  the  operator  T.R?  — » R~'  that  maps  a point  (*,  yr  z)  into 

(a)  its  reflection  through  the  xy- plane. 

(b)  its  reflection  through  the  .xr-plane. 

(c)  its  reflection  through  the  yz-p lane. 


Answer: 

1 0 0 

0 1 0 

0 0-1 


(a) 


(b) 

"1 

0 

0 

0 

- 

-1 

0 

_0 

0 

1 

(c) 

- 

■1 

0 

0 

0 

1 

0 

0 

0 

1 

4.  For  each  part  of  Exercise  3,  use  the  matrix  you  have  obtained  to  compute  T(  1,  1,  1).  Check  your  answers 
geometrically  by  plotting  the  points  (1,  1,  1)  andT(l,  1,1). 

5.  Find  the  standard  matrix  for  the  operator  T\F?  B?  that 

(a)  rotates  each  vector  90°  counterclockwise  about  the  z-axis  (looking  along  the  positive  z-axis  toward  the  origin). 

(b)  rotates  each  vector  90°  counterclockwise  about  the  x-axis  (looking  along  the  positive  x-axis  toward  the  origin). 

(c)  rotates  each  vector  90°  counterclockwise  about  they-axis  (looking  along  the  positive  y-axis  toward  the  origin). 


Answer: 

(a)  [0 

— 

■1 

0 

1 

0 

0 

0 

0 

1 

(b)  n 

0 

0 

0 

0 

- 

-1 

0 

1 

0 

(c) 

0 

0 

1 

0 

1 

0 

- 

•1 

0 

0 

6.  Sketch  the  image  of  the  rectangle  with  vertices  (0,  0),  (1,  0),  (1,  2),  and  (0,  2)  under 

(a)  a reflection  about  the  x-axis. 

(b)  a reflection  about  the  y-axis. 

(c)  a compression  of  factor  k = in  the  y-direction. 

(d)  an  expansion  of  factor  k=2m  the  x-direction. 

(e)  a shear  of  factor  k = 3 in  the  x-direction. 

(f)  a shear  of  factor  k=2m  the  y-direction. 

7.  Sketch  the  image  of  the  square  with  vertices  (0,0),  (1,0),  (0,  1),  and  (1,1)  under  multiplication  by 


Answer: 

Rectangle  with  vertices  at  (0,  0),  (—3,  0),  (0,  1),  (—3,  1) 

8.  Find  the  matrix  that  rotates  a point  (*,  y ) about  the  origin 

(a)  45° 

(b)  90° 

(c)  180° 

(d)  270° 


(e)  -30' 


9.  Find  the  matrix  that  shears  by 

(a)  a factor  of  = 4 in  the  j-direction. 

(b)  a factor  of  = — 2 in  the  x-direction. 


Answer: 


(a) 

(b) 


1 0 
4 1 

1 -2 

0 1 


10.  Find  the  matrix  that  compresses  or  expands  by 

(a)  a factor  of  -j  in  they-direction. 

(b)  a factor  of  6 in  the  x-direction. 


11.  In  each  part,  describe  the  geometric  effect  of  multiplication  by  A. 
"3  0" 

_0  1 
"1  0 
0 -5 


A = 
^A  = 
(c)  A = 


1 4 
0 1 


Answer: 


(a)  Expansion  by  a factor  of  3 in  the  x-direction 

(b)  Expansion  by  a factor  of  5 in  the  j-direction  and  reflection  about  the  x-axis 

(c)  Shearing  by  a factor  of  4 in  the  x-direction 


12.  In  each  part,  express  the  matrix  as  a product  of  elementary  matrices,  and  then  describe  the  effect  of  multiplication  by 
A in  terms  of  compressions,  expansions,  reflections,  and  shears. 


(a) 


(b) 


(c) 


(d) 


A = 


A = 


A = 


A = 


2 

_0 

"1 

2 

"0 

4 

'1 

4 


0 

3_ 

4" 

9_ 

-2 

0_ 

-3' 

6 


13.  In  each  part,  find  a single  matrix  that  performs  the  indicated  succession  of  operations. 

(a)  Compresses  by  a factor  of  -1  in  the  x-direction,  then  expands  by  a factor  of  5 in  the  y-direction. 


(b)  Expands  by  a factor  of  5 in  the  y-dircction,  then  shears  by  a factor  of  2 in  the  y-dircction. 

(c)  Reflects  about  y = x,  then  rotates  through  an  angle  of  1 80°  about  the  origin. 


Answer: 


(a) 

(b) 

(c) 


0 5J 

1 0" 

2 5. 

0 -1 
-1  0 


14.  In  each  part,  find  a single  matrix  that  performs  the  indicated  succession  of  operations. 

(a)  Reflects  about  they-axis,  then  expands  by  a factor  of  5 in  the  x-direction,  and  then  reflects  about  y = x. 

(b)  Rotates  through  30°  about  the  origin,  then  shears  by  a factor  of  _2  in  the  y-direction,  and  then  expands  by  a 
factor  of  3 in  the  y-direction. 


15.  Use  matrix  inversion  to  show  the  following. 

(a)  The  inverse  transformation  for  a reflection  about  y = x is  a reflection  about  y = x. 

(b)  The  inverse  transformation  for  a compression  along  an  axis  is  an  expansion  along  that  axis. 

(c)  The  inverse  transformation  for  a reflection  about  a coordinate  axis  is  a reflection  about  that  axis. 

(d)  The  inverse  transformation  for  a shear  along  a coordinate  axis  is  a shear  along  that  axis. 


16.  Find  an  equation  of  the  image  of  the  line  y — — 4x  \ 3 under  multiplication  by 


17.  In  parts  (a)  through  (e),  find  an  equation  of  the  image 

(a)  a shear  of  factor  3 in  the  x-direction. 

(b)  a compression  of  factor  in  the  y-direction. 


of  the  line  y = 2x  under 


(c)  a reflection  about  y = x. 

(d)  a reflection  about  the  y-axis. 

(e)  a rotation  of  60°  about  the  origin. 


Answer: 


(a)  y = TjX 

(b)  y = * 

(c) y=h 

(d)  y=  -2x 

(e)  / 8 + 5 ^3 

11 


18.  Find  the  matrix  for  a shear  in  the  x-direction  that  transforms  the  triangle  with  vertices  (0,  0),  (2,  1),  and  (3,  0)  into 
a right  triangle  with  the  right  angle  at  the  origin. 


(a)  Show  that  multiplication  by 


A = 


"3  f 
6 2 


maps  each  point  in  the  plane  onto  the  line  y = 2x- 


(b)  It  follows  from  part  (a)  that  the  noncollinear  points  (1,0),  (0,  1),  ( — 1,0)  are  mapped  onto  a line.  Does  this 
violate  part  ( e ) of  Theorem  4.11.3? 

Answer: 

(b)  No 

20.  Prove  part  {a)  of  Theorem  4.11.3.  [Hint:  A line  in  the  plane  has  an  equation  of  the  form  Ax  4-  C = 0,  where  A 
and  B are  not  both  zero.  Use  the  method  of  Example  6 to  show  that  the  image  of  this  line  under  multiplication  by  the 
invertible  matrix 

\a  b 

lc  d. 

has  the  equation  A'  x ^ B ' y -\-  C = 0,  where 

A1  = (dA  — cB)  / {ad  — be) 

and 

B'  =(-bA  + aB)l(ad-bc) 

Then  show  that  A 1 and  B 1 are  not  both  zero  to  conclude  that  the  image  is  a line.] 

21.  Use  the  hint  in  Exercise  20  to  prove  parts  ( b ) and  (c)  of  Theorem  4.11.3. 

22.  In  each  part  of  the  accompanying  figure,  find  the  standard  matrix  for  the  operator  described. 


23.  In  the  shear  in  the  xy-direction  with  factor  k is  the  matrix  transformation  that  moves  each  point  (x7  yt  z ) Parallel 
to  the  xy-plane  to  the  new  position  -|  kz,  y I kz,  z)  • (See  the  accompanying  figure.) 

(a)  Find  the  standard  matrix  for  the  shear  in  the  xy-direction  with  factor  k. 

(b)  How  would  you  define  the  shear  in  the  xz-direction  with  factor  k and  the  shear  in  the  yz-direction  with  factor  kl 
Find  the  standard  matrices  for  these  matrix  transformations. 


Figure  Ex-23 


Answer: 


(a) 


1 0 k 
0 1 k 
0 0 1 


(b)  Shear  in  the  xz-direction  with 


factor  k maps  (x,  y,  z)  to  (x  + ky , yfz- f ky)  • 


1 k 0 
0 1 0 
0 k 1 


Shear  in  the  yz-direction  with  factor  k maps  (x,  y,  z)  to  (xr  y + kx,z  + kx)- 

True-False  Exercises 


T 0 0 
k 1 0 
k 0 1 


In  parts  (a)-(g)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  The  image  of  the  unit  square  under  a one-to-one  matrix  operator  is  a square. 

Answer: 

False 

(b)  A 2 x 2 invertible  matrix  operator  has  the  geometric  effect  of  a succession  of  shears,  compressions,  expansions,  and 
reflections. 

Answer: 

True 

(c)  The  image  of  a line  under  a one-to-one  matrix  operator  is  a line. 

Answer: 

True 

(d)  Every  reflection  operator  on  is  its  own  inverse. 

Answer: 

True 

^ The  matrix 


~[:  -i 


represents  reflection  about  a line. 


Answer: 

False 

® The  matrix 


1 -2 

2 1 


represents  a shear. 


Answer: 


False 


(g> 


The  matrix 


1 

0 


0 

3 


represents  an  expansion. 


Answer: 


True 
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4.12  Dynamical  Systems  and  Markov  Chains 

In  this  optional  section  we  will  show  how  matrix  methods  can  be  used  to  analyze  the  behavior  of  physical  systems  that 
evolve  over  time.  The  methods  that  we  will  study  here  have  been  applied  to  problems  in  business,  ecology, 
demographics,  sociology,  and  most  of  the  physical  sciences. 


Dynamical  Systems 

A dynamical  system  is  a finite  set  of  variables  whose  values  change  with  time.  The  value  of  a variable  at  a point  in  time 
is  called  the  state  of  the  variable  at  that  time,  and  the  vector  formed  from  these  states  is  called  the  state  of  the 
dynamical  system  at  that  time.  Our  primary  objective  in  this  section  is  to  analyze  how  the  state  of  a dynamical  system 
changes  with  time.  Let  us  begin  with  an  example. 

EXAMPLE  1 Market  Share  as  a Dynamical  System 

Suppose  that  two  competing  television  channels,  channel  1 and  channel  2,  each  have  50%  of  the  viewer 
market  at  some  initial  point  in  time.  Assume  that  over  each  one-year  period  channel  1 captures  10%  of 
channel  2's  share,  and  channel  2 captures  20%  of  channel  Ts  share  (see  Figure  4.12.1).  What  is  each 
channel's  market  share  after  one  year? 

10% 

Channel  « Channel 


20% 

80%  90% 

Channel  I loses  20%  and 
holds  80%. 

Channel  2 loses  10%  and 
holds  90%. 


Figure  4.12.1 


Let  us  begin  by  introducing  the  time-dependent  variables 

x i (t)  = fraction  of  the  market  held  by  channel  1 at  time  t 

X2  (0  = fraction  of  the  market  held  by  channel  2 at  time  t 


and  the  column  vector 


*oo 


*i  (0 
*2  (0 


<-  Channel  I s fraction  of  the  market  at  time  t in  years 
«-  Channel  2 s fraction  of  the  market  at  time  t in  years 


The  variables  x\  (t)  and  *2(0  form  a dynamical  system  whose  state  at  time  t is  the  vector  . If  we 
take  t — 0 to  be  the  starting  point  at  which  the  two  channels  had  50%  of  the  market,  then  the  state  of  the 
system  at  that  time  is 


*(0)  = 


*1(0)" 

0.5" 

*2(0) 

0.5 

«—  Channel  I s fraction  of  the  market  at  time*  = 0 
4-  Channel  2 s fraction  of  the  market  at  time  * = 0 


(1) 


Now  let  us  try  to  find  the  state  of  the  system  at  time  t=\  (one  year  later).  Over  the  one-year  period, 
channel  1 retains  80%  of  its  initial  50%,  and  it  gains  10%  of  channel  2's  initial  50%.  Thus, 


xl(l)  = 0. 8(0.5)  =h  0.1  (0.5)  = 0.45 


(2) 


Similarly,  channel  2 gains  20%  of  channel  l's  initial  50%,  and  retains  90%  of  its  initial  50%.  Thus, 


x2(l)  = 0.2(0. 5)  + 0.9(0. 5)  = 0.55 

Therefore,  the  state  of  the  system  at  time  t=  \ is 

0 45  4—  Channel  I s fraction  of  the  market  at  time  i = 1 
0.55  4—  Channel  2 s fraction  of  the  market  at  time  / = 1 


(3) 

(4) 


EXAMPLE  2 Evolution  of  Market  Share  over  Five  Years 

Track  the  market  shares  of  channels  1 and  2 in  Example  1 over  a five-year  period. 

To  solve  this  problem  suppose  that  we  have  already  computed  the  market  share  of  each 
channel  at  time  t=k  and  we  are  interested  in  using  the  known  values  of  x\  (k)  and  x2(£)  to  compute  the 
market  shares  x\(k-h  1)  and  x2(£  + 1)  one  year  later.  The  analysis  is  exactly  the  same  as  that  used  to 
obtain  Equations  2 and  3.  Over  the  one-year  period,  channel  1 retains  80%  of  its  starting  fraction  x \ (k) 
and  gains  10%  of  channel  2's  starting  fraction  X2  (k) . Thus, 

xi(k+  1)  = (0.8)xi(*)  + (0.1)x2(*)  (5) 

Similarly,  channel  2 gains  20%  of  channel  l’s  starting  fraction  x \ (k)  and  retains  90%  of  its  own  starting 
fraction  x2  (£) . Thus, 


*2(*  + 1)  = (0.2)xi(*)  + (0.9)x2(*) 


(6) 


Equations  5 and  6 can  be  expressed  in  matrix  form  as 


*1  (lt+  1) 

'0.8 

0 r 

*l(*) 

X2(£+  1) 

0.2 

0.9_ 

*2(*) 

(7) 


which  provides  a way  of  using  matrix  multiplication  to  compute  the  state  of  the  system  at  time  t = k + 1 
from  the  state  at  time  t — For  example,  using  1 and  7 we  obtain 


'0.8 

0 r 

x(0)  = 

'0.8 

Of 

0.5' 

0.45' 

0.2 

0 9. 

0.2 

°9. 

_0.5_ 

_0.55_ 

which  agrees  with  4.  Similarly, 


'0.8 

Of 

x(l)  = 

'0.8 

Of 

'0.45' 

0.415' 

02 

0.9. 

0.2 

0.9_ 

_0.55_ 

_0.585_ 

We  can  now  continue  this  process,  using  Formula  7 to  compute  x(3)  from  x(2),  then  x(4)  from  x(3), 
and  so  on.  This  yields  (verify) 


x(3)  = 


0.3905 
0.6095  ’ 


x(4)  = 


0.37335 
0.62665  ’ 


x(5)  = 


0.361345 

0.638655 


(8) 


Thus,  after  five  years,  channel  1 will  hold  about  36%  of  the  market  and  channel  2 will  hold  about  64%of 
the  market. 


If  desired,  we  can  continue  the  market  analysis  in  the  last  example  beyond  the  five-year  period  and  explore  what 
happens  to  the  market  share  over  the  long  term.  We  did  so,  using  a computer,  and  obtained  the  following  state  vectors 
(rounded  to  six  decimal  places): 


*(10) 


0.338041 
0.661959  ’ 


*(20) 


0.333466 
0.666534  ’ 


x(40) 


0.333333 

0.666667 


(9) 


All  subsequent  state  vectors,  when  rounded  to  six  decimal  places,  are  the  same  as  x(40),  so  we  see  that  the  market 
shares  eventually  stabilize  with  channel  1 holding  about  one-third  of  the  market  and  channel  2 holding  about 
two-thirds.  Later  in  this  section,  we  will  explain  why  this  stabilization  occurs. 


Markov  Chains 

In  many  dynamical  systems  the  states  of  the  variables  are  not  known  with  certainty  but  can  be  expressed  as 
probabilities;  such  dynamical  systems  are  called  stochastic  processes  (from  the  Greek  word  stokastikos , meaning 
“proceeding  by  guesswork”).  A detailed  study  of  stochastic  processes  requires  a precise  definition  of  the  term 
probability , which  is  outside  the  scope  of  this  course.  However,  the  following  interpretation  will  suffice  for  our  present 
purposes: 


Stated  informally ; the  probability  that  an  experiment  or  observation  will  have  a certain  outcome  is 
approximately  the  fraction  of  the  time  that  the  outcome  would  occur  if  the  experiment  were  to  be  repeated  many 
times  under  constant  conditions — the  greater  the  number  of  repetitions,  the  more  accurately  the  probability 
describes  the  fraction  of  occurrences. 


For  example,  when  we  say  that  the  probability  of  tossing  heads  with  a fair  coin  is  we  mean  that  if  the  coin  were 


tossed  many  times  under  constant  conditions,  then  we  would  expect  about  half  of  the  outcomes  to  be  heads. 
Probabilities  are  often  expressed  as  decimals  or  percentages.  Thus,  the  probability  of  tossing  heads  with  a fair  coin  can 
also  be  expressed  as  0.5  or  50%. 


If  an  experiment  or  observation  has  n possible  outcomes,  then  the  probabilities  of  those  outcomes  must  be  nonnegative 
fractions  whose  sum  is  1 . The  probabilities  are  nonnegative  because  each  describes  the  fraction  of  occurrences  of  an 
outcome  over  the  long  term,  and  the  sum  is  1 because  they  account  for  all  possible  outcomes.  For  example,  if  a box 
containing  10  balls  has  one  red  ball,  three  green  balls,  and  six  yellow  balls,  and  if  a ball  is  drawn  at  random  from  the 
box,  then  the  probabilities  of  the  various  outcomes  are 

p \ = prob(red)  = 1/10  = 0.1 
P2  = prob(green)  = 3 / 10  = 0.3 
P3  = prob(yellow)  = 6/10  = 0.6 
Each  probability  is  a nonnegative  fraction  and 

Pi  +<P2  + P3  = 0.1  + 0.3  + 0.6  = 1 

In  a stochastic  process  with  n possible  states,  the  state  vector  at  each  time  t has  the  form 


Probability  that  the  system  is  in  state  1 
Probability  that  the  system  is  in  state  2 

Prob  ability  that  the  system  is  in  state  n 

The  entries  in  this  vector  must  add  up  to  1 since  they  account  for  all  n possibilities.  In  general,  a vector  with 
nonnegative  entries  that  add  up  to  1 is  called  a probability  vector. 

EXAMPLE  3 Example  1 Revisited  from  the  Probability  Viewpoint 


*00  = 


*i(0 

*200 

*m00 


Observe  that  the  state  vectors  in  Example  1 and  Example  2 are  all  probability  vectors.  This  is  to  be 
expected  since  the  entries  in  each  state  vector  are  the  fractional  market  shares  of  the  channels,  and  together 
they  account  for  the  entire  market.  In  practice,  it  is  preferable  to  interpret  the  entries  in  the  state  vectors  as 
probabilities  rather  than  exact  market  fractions,  since  market  information  is  usually  obtained  by  statistical 
sampling  procedures  with  intrinsic  uncertainties.  Thus,  for  example,  the  state  vector 


*0)  = 


*i(l) 

'0.45' 

*2d) 

_0.55_ 

which  we  interpreted  in  Example  1 to  mean  that  channel  1 has  45%  of  the  market  and  channel  2 has  55%, 
can  also  be  interpreted  to  mean  that  an  individual  picked  at  random  from  the  market  will  be  a channel  1 
viewer  with  probability  0.45  and  a channel  2 viewer  with  probability  0.55. 


A square  matrix,  each  of  whose  columns  is  a probability  vector,  is  called  a stochastic  matrix.  Such  matrices  commonly 
occur  in  formulas  that  relate  successive  states  of  a stochastic  process.  For  example,  the  state  vectors  x(£  +1)  and  r(£) 
in  7 are  related  by  an  equation  of  the  form  x (k  4-  1 ) = Px  {k)  in  which 


P = 


0.8  0.1 

0.2  0.9 


(10) 


is  a stochastic  matrix.  It  should  not  be  surprising  that  the  column  vectors  of  P are  probability  vectors,  since  the  entries 
in  each  column  provide  a breakdown  of  what  happens  to  each  channel's  market  share  over  the  year — the  entries  in 
column  1 convey  that  each  year  channel  1 retains  80%  of  its  market  share  and  loses  20%;  and  the  entries  in  column  2 
convey  that  each  year  channel  2 retains  90%  of  its  market  share  and  loses  10%.  The  entries  in  10  can  also  be  viewed  as 
probabilities: 

p\\  = 0.8  = probability  that  a channel  1 viewer  remains  a channel  1 viewer 

P2\  = 0.2  = probability  that  a channel  1 viewer  becomes  a channel  2 viewer 

Pl 2 =0.1=  probability  that  a channel  2 viewer  becomes  a channel  1 viewer 

£>22  = 0.9  = probability  that  a channel  2 viewer  remains  a channel  2 viewer 

Example  1 is  a special  case  of  a large  class  of  stochastic  processes,  called  Markov  chains. 


Andrei  Andreyevich  Markov  (1856-1922) 

Markov  chains  are  named  in  honor  of  the  Russian  mathematician  A.  A.  Markov,  a lover  of 
poetry,  who  used  them  to  analyze  the  alternation  of  vowels  and  consonants  in  the  poem  Eugene  Onegin  by 
Pushkin.  Markov  believed  that  the  only  applications  of  his  chains  were  to  the  analysis  of  literary  works,  so  he 
would  be  astonished  to  learn  that  his  discovery  is  used  today  in  the  social  sciences,  quantum  theory,  and 
genetics! 

[Image:  wikipedia ] 


DEFINITION  1 

A Markov  chain  is  a dynamical  system  whose  state  vectors  at  a succession  of  time  intervals  are  probability 
vectors  and  for  which  the  state  vectors  at  successive  time  intervals  are  related  by  an  equation  of  the  form 

x(*+l)=Ac(*) 

in  which  P = [Pij]  is  a stochastic  matrix  and  Pij  is  the  probability  that  the  system  will  be  in  state  i at  time 
t = k + 1 if  it  is  in  state  j at  time  t = £.  The  matrix  P is  called  the  transition  matrix  for  the  system. 


J 


Note  that  in  this  definition  the  row  index  i corresponds  to  the  later  state  and  the  column  index  j to  the  earlier 
state  (Figure  4.12.2). 


Stale  at  time  t = k 

i 

State  at  lime 
\ t = k+ \ 


The  entry  is  the  probability 
that  the  system  is  in  state  i at 
time  / = *+  1 if  it  is  in  state  j 
at  timer  = k. 


Figure  4.12.2 


EXAMPLE  4 Wildlife  Migration  as  a Markov  Chain 


Suppose  that  a tagged  lion  can  migrate  over  three  adjacent  game  reserves  in  search  of  food,  reserve  1, 
reserve  2,  and  reserve  3.  Based  on  data  about  the  food  resources,  researchers  conclude  that  the  monthly 
migration  pattern  of  the  lion  can  be  modeled  by  a Markov  chain  with  transition  matrix 

Reserve  at  time  t=k 

1 2 3 


P = 


0.5 

0.4 

0.6 

0.2 

0.2 

0.3 

0.3 

0.4 

0.1 

1 

2 

3 


Reserve  at  time  t = k + 1 


(see  Figure  4.12.3).  That  is, 

p 1 1 =0.5  = probability  that  the  lion  will  stay  in  reserve  1 when  it  is  in  reserve  1 

p\2  = 0.4  = probability  that  the  lion  will  move  from  reserve  2 to  reserve  1 

P\2  = 0.6  = probability  that  the  lion  will  move  from  reserve  3 to  reserve  1 

P2\  = 0.2  = probability  that  the  lion  will  move  from  reserve  1 to  reserve  2 

P22  = 0.2  = probability  that  the  lion  will  stay  in  reserve  2 when  it  is  in  reserve  2 

P23  = 0.3  = probability  that  the  lion  will  move  from  reserve  3 to  reserve  2 

P2\  = 0.3  = probability  that  the  lion  will  move  from  reserve  1 to  reserve  3 

P22  = 0.4  = probability  that  the  lion  will  move  from  reserve  2 to  reserve  3 

p^^  =0.1=  probability  that  the  lion  will  stay  in  reserve  3 when  it  is  in  reserve  3 

Assuming  that  t is  in  months  and  the  lion  is  released  in  reserve  2 at  time  £ = 0,  track  its  probable 
locations  over  a six-month  period. 

0.5 


Let  x\  (k),  X2 (k),  and  *3(£)  be  the  probabilities  that  the  lion  is  in  reserve  1,  2,  or  3, 
respectively,  at  time  t = k , and  let 

*i(*) 
x{k)  = *2(k) 

x3(k) 

be  the  state  vector  at  that  time.  Since  we  know  with  certainty  that  the  lion  is  in  reserve  2 at  time  i = 0,  the 
initial  state  vector  is 

"O' 

1 
0 


x(0)  = 


We  leave  it  for  you  to  show  that  the  state  vectors  over  a six-month  period  are 


'0.400' 

"0.520' 

"0.500' 

x(  1 ) =Px{  0)  = 

0.200 

,x(2)=JPx(l)  = 

0.240 

,x(3)=Px(2)  = 

0.224 

0.400 

0.240 

0.216  _ 

'0.505' 

'0.504' 

'0.504' 

x(4)  =Px{  3)« 

0.228 

,x(5)=.Px(4)« 

0.227 

, x(6)  =Px(5)  sa 

0.227 

0.267 

0.269 

0.269 

As  in  Example  2,  the  state  vectors  here  seem  to  stabilize  over  time  with  a probability  of  approximately 
0.504  that  the  lion  is  in  reserve  1,  a probability  of  approximately  0.227  that  it  is  in  reserve  2,  and  a 
probability  of  approximately  0.269  that  it  is  in  reserve  3. 


Markov  Chains  in  Terms  of  Powers  of  the  Transition  Matrix 

In  a Markov  chain  with  an  initial  state  of  x(0) , the  successive  state  vectors  are 

x(l)=ftc(0)fx(2)=ftc(l)fx(3)=A:(2)fx(4)=ftc(3)f.„ 

For  brevity,  it  is  common  to  denote  x(£)  by  x^,  which  allows  us  to  write  the  successive  state  vectors  more  briefly 

= Pxq,  X2  = Px\,  X2  = Px2,  X4  = Px^, ... 


Note  that  Formula  12  makes  it  possible  to  compute 
the  state  vector  x^  without  first  computing  the 
earlier  state  vectors  as  required  in  Formula  11. 

Alternatively,  these  state  vectors  can  be  expressed  in  terms  of  the  initial  state  vector  xq  as 

x\=Pxq,  X2  = p(Pxq^  = P2xq,  X2=p(p2xq''i  = P3xq,  X4  = p(p3xq'j  = P4xq,  ... 
from  which  it  follows  that 

xk  = Pkx  o 


EXAMPLE  5 Finding  a State  Vector  Directly  from  xo 


Use  Formula  12  to  find  the  state  vector  x(3)  in  Example  2. 


From  1 and  7,  the  initial  state  vector  and  transition  matrix  are 


xq  =x 


0.5 

0.5 


We  leave  it  for  you  to  calculate  p-'  and  show  that 


and 


0.1 

0.9 


n 3 

"0.562 

0.219' 

'0.5' 

'0.3905' 

X3  =P  xq  = 

0.438 

0.781 

0.5 

— 

0.6095 

which  agrees  with  the  result  in  8. 


Long-Term  Behavior  of  a Markov  Chain 


We  have  seen  two  examples  of  Markov  chains  in  which  the  state  vectors  seem  to  stabilize  after  a period  of  time.  Thus, 
it  is  reasonable  to  ask  whether  all  Markov  chains  have  this  property.  The  following  example  shows  that  this  is  not  the 
case. 


EXAMPLE  6 A Markov  Chain  That  Does  Not  Stabilize 


The  matrix 


1 

0 


is  stochastic  and  hence  can  be  regarded  as  the  transition  matrix  for  a Markov  chain.  A simple  calculation 
shows  that  p2  = /,  from  which  it  follows  that 


/ = = P^  = ...  and  P = P^  = P^  = = ... 

Thus,  the  successive  states  in  the  Markov  chain  with  initial  vector  xq  are 

XQ,  Px 0,  XQ,  Px 0,  XQ,  ... 


which  oscillate  between  xq  and  Pxq.  Thus,  the  Markov  chain  does  not  stabilize  unless  both  components 
of  xq  are  (verify). 


A precise  definition  of  what  it  means  for  a sequence  of  numbers  or  vectors  to  stabilize  is  given  in  calculus;  however, 
that  level  of  precision  will  not  be  needed  here.  Stated  informally,  we  will  say  that  a sequence  of  vectors 

XI,  X2,  — , Xft,  — 

approaches  a limit  q or  that  it  converges  to  q if  all  entries  in  x^  can  be  made  as  close  as  we  like  to  the  corresponding 
entries  in  the  vector  q by  taking  k sufficiently  large.  We  denote  this  by  writing  x^  — ► q as  — ► oo- 

We  saw  in  Example  6 that  the  state  vectors  of  a Markov  chain  need  not  approach  a limit  in  all  cases.  However,  by 
imposing  a mild  condition  on  the  transition  matrix  of  a Markov  chain,  we  can  guarantee  that  the  state  vectors  will 
approach  a limit. 


n 


DEFINITION  2 

A stochastic  matrix  P is  said  to  be  regular  if  P or  some  positive  power  of  P has  all  positive  entries,  and  a 
Markov  chain  whose  transition  matrix  is  regular  is  said  to  be  a regular  Markov  chain. 


J 


EXAMPLE  7 Regular  Stochastic  Matrices 


The  transition  matrices  in  Example  2 and  Example  4 are  regular  because  their  entries  are  positive.  The 
matrix 


is  regular  because 


1 

0 


P 2 


0.75  0.5 
0.25  0.5 


has  positive  entries.  The  matrix  P in  Example  6 is  not  regular  because  P and  every  positive  power  of  P 
have  some  zero  entries  (verify). 


The  following  theorem,  which  we  state  without  proof,  is  the  fundamental  result  about  the  long-term  behavior  of 
Markov  chains. 


THEOREM  4.12.1 

If  P is  the  transition  matrix  for  a regular  Markov  chain,  then: 

(a)  There  is  a unique  probability  vector  q such  that  Pq  = q. 

(b)  For  any  initial  probability  vector  xq,  the  sequence  of  state  vectors 

XO,  •Pxo .PSco, ... 

converges  to  q. 


The  vector  q in  this  theorem  is  called  the  steady-state  vector  of  the  Markov  chain.  It  can  be  found  by  rewriting  the 
equation  in  part  (a)  as 

(I-P)  q = 0 

and  then  solving  this  equation  for  q subject  to  the  requirement  that  q be  a probability  vector.  Here  are  some  examples. 


EXAMPLE  7 Example  1 and  Example  2 Revisited 


The  transition  matrix  for  the  Markov  chain  in  Example  2 is 


P = 


0.8 

0.2 


0.1 

0.9 


Since  the  entries  of  P are  positive,  the  Markov  chain  is  regular  and  hence  has  a unique  steady-state  vector 
q To  find  q we  will  solve  the  system  (/  — P)  q = 0,  which  we  can  write  as 


0.2 

-or 

Vf 

"0" 

-0.2 

0.1_ 

■72 

_0_ 

The  general  solution  of  this  system  is 

?l=0.5s,  <72  = s 

(verify),  which  we  can  write  in  vector  form  as 


For  q to  be  a probability  vector,  we  must  have 


1 = <?1  +<?2  = -|s 

2 

which  implies  that  s = Substituting  this  value  in  13  yields  the  steady-state  vector 


9 = 


which  is  consistent  with  the  numerical  results  obtained  in  9. 


EXAMPLE  9 Example  4 Revisited 


The  transition  matrix  for  the  Markov  chain  in  Example  4 is 


P = 


0.5  0.4 
0.2  0.2 
0.3  0.4 


0.6 

0.3 

0.1 


Since  the  entries  of  P are  positive,  the  Markov  chain  is  regular  and  hence  has  a unique  steady-state  vector 
q To  find  q we  will  solve  the  system  (/  — P)  q = 0,  which  we  can  write  (using  fractions)  as 


1 

2 

I 

"5 

3_ 

10 


2 

"5 

4 

5 

2 

"5 


3 

“5 

'10 

_9_ 

10 


■?l' 

'o' 

<72 

= 

0 

<73 

0 

(14) 


(We  have  converted  to  fractions  to  avoid  roundoff  error  in  this  illustrative  example.)  We  leave  it  for  you 
to  confirm  that  the  reduced  row  echelon  form  of  the  coefficient  matrix  is 


and  that  the  general  solution  of  14  is 


1 0 

0 1 
0 0 


15 

8 

27 

32 

0 


15 


27 


?1  = -g-s>‘?2  = j^s,q3  = s 


(15) 


32 

119 


For  q to  be  a probability  vector  we  must  have  q\  -F  #2  + <73  = 1,  from  which  it  follows  that  s 
(verify).  Substituting  this  value  in  15  yields  the  steady-state  vector 


q = 


60 

119 

27 

119 

32 

119 


0.5042 

0.2269 

0.2689 


(verify),  which  is  consistent  with  the  results  obtained  in  Example  4. 


Concept  Review 

Dynamical  system 
State  of  a variable 
State  of  a dynamical  system 
Stochastic  process 
Probability 
Probability  vector 
Stochastic  matrix 
Markov  chain 
Transition  matrix 
Regular  stochastic  matrix 
Regular  Markov  chain 
Steady-state  vector 

Skills 

Determine  whether  a matrix  is  stochastic. 

Compute  the  state  vectors  from  a transition  matrix  and  an  initial  state. 
Determine  whether  a stochastic  matrix  is  regular. 

Determine  whether  a Markov  chain  is  regular. 

Find  the  steady-state  vector  for  a regular  transition  matrix. 


Exercise  Set  4.12 


In  Exercises  1-2,  determine  whether  A is  a stochastic  matrix.  If  A is  not  stochastic,  then  explain  why  not. 


L(a) 

(b) 


A = 
A = 


0.4 

0.6 

0.4 

0.3 


0.3 

0.7 

0.6 

0.7 


(C) 


A = 


(d) 


A = 


1 

2 

0 

1 

2 

I 

3 

I 

3 

I 

3 


i 

3 

I 

3 

I 

3 


1 

2 
2 
2 

1 


Answer: 


(a)  Stochastic 

(b)  Not  stochastic 

(c)  Stochastic 

(d)  Not  stochastic 


2'  (a)  ,[0  2 0.9 

0.8  0.1 

(b)  ,[0.2  0.8 

0.9  0.1 


(c) 


A = 


12 

1 

2 

5_ 

12 


I 

9 

0 

8 

9 


I 

6 

5 

6 

0 


(d) 


A = 


0 


2 


I I 

3 2 

I I 

3 2 


In  Exercises  3^1,  use  Formulas  11  and  12  to  compute  the  state  vector  X4  in  two  different  ways. 


3. 


P = 


0.5  0.6 
0.5  0.4 


;*o  = 


0.5 

0.5 


Answer: 


0.54545 

0.45455 


4. 


P = 


0.8  0.5  . 
0.2  0.5  ’ 


*0  = 


1 

0 


In  Exercises  5-6,  determine  whether  P is  a regular  stochastic  matrix. 


p= 


5-  (a) 


(b) 


(c) 


P = 


P = 


1 

7 

6 

7 

0 

1 


1 

0 


Answer: 


(a)  Regular 

(b)  Not  regular 

(c)  Regular 

6-  (a) 

P 


3 1 

4 3 

1 2 

4 3 

In  Exercises  7-10,  verify  that  P is  a regular  stochastic  matrix,  and  find  the  steady-state  vector  for  the  associated 
Markov  chain. 


(b) 

P = 

(c) 

P = 


7. 


P = 


I 

4 

3 

4 


2 

3 

I 

3 


Answer: 


17 

_9_ 

17 


8. 


P = 


0.2  0.6 
0.8  0.4 


9. 


P = 


1 

2 

I 

4 

I 

4 


1 

2 

1 

2 

0 


0 

1 

3 

2 
3 


Answer: 


A 

11 

4_ 

11 

A 

ll 


10. 


p = 


1 

3 

0 

2 
3 


I 

4 

3 

4 

0 


2 

5 

2 

5 

1 

5 


11.  Consider  a Markov  process  with  transition  matrix 


State  1 State  2 

State  1 0.2  0.1 

State  2 0.8  0.9 


(a)  What  does  the  entry  0.2  represent? 

(b)  What  does  the  entry  0. 1 represent? 

(c)  If  the  system  is  in  state  1 initially,  what  is  the  probability  that  it  will  be  in  state  2 at  the  next  observation? 

(d)  If  the  system  has  a 50%  chance  of  being  in  state  1 initially,  what  is  the  probability  that  it  will  be  in  state  2 at  the 
next  observation? 


Answer: 


(a)  Probability  that  something  in  state  1 stays  in  state  1 

(b)  Probability  that  something  in  state  2 moves  to  state  1 

(c)  0.8 

(d)  0.85 


12.  Consider  a Markov  process  with  transition  matrix 


State  1 State  2 


State  1 
State  2 


0 i 

* § 


(a)  what  does  the  entry  y represent? 

(b)  What  does  the  entry  0 represent? 


(c)  If  the  system  is  in  state  1 initially,  what  is  the  probability  that  it  will  be  in  state  1 at  the  next  observation? 

(d)  If  the  system  has  a 50%  chance  of  being  in  state  1 initially,  what  is  the  probability  that  it  will  be  in  state  2 at  the 
next  observation? 

13.  On  a given  day  the  air  quality  in  a certain  city  is  either  good  or  bad.  Records  show  that  when  the  air  quality  is  good 
on  one  day,  then  there  is  a 95%  chance  that  it  will  be  good  the  next  day,  and  when  the  air  quality  is  bad  on  one  day, 
then  there  is  a 45%  chance  that  it  will  be  bad  the  next  day. 

(a)  Find  a transition  matrix  for  this  phenomenon. 

(b)  If  the  air  quality  is  good  today,  what  is  the  probability  that  it  will  be  good  two  days  from  now? 

(c)  If  the  air  quality  is  bad  today,  what  is  the  probability  that  it  will  be  bad  three  days  from  now? 

(d)  If  there  is  a 20%  chance  that  the  air  quality  will  be  good  today,  what  is  the  probability  that  it  will  be  good 
tomorrow? 

Answer: 

(a)  [0.95  0.55" 

0.05  0.45_ 

(b)  0.93 

(c)  0.142 

(d)  0.63 

14.  In  a laboratory  experiment,  a mouse  can  choose  one  of  two  food  types  each  day,  type  I or  type  II.  Records  show  that 
if  the  mouse  chooses  type  I on  a given  day,  then  there  is  a 75%  chance  that  it  will  choose  type  I the  next  day,  and  if 
it  chooses  type  II  on  one  day,  then  there  is  a 50%  chance  that  it  will  choose  type  II  the  next  day. 

(a)  Find  a transition  matrix  for  this  phenomenon. 

(b)  If  the  mouse  chooses  type  I today,  what  is  the  probability  that  it  will  choose  type  I two  days  from  now? 

(c)  If  the  mouse  chooses  type  II  today,  what  is  the  probability  that  it  will  choose  type  II  three  days  from  now? 

(d)  If  there  is  a 10%  chance  that  the  mouse  will  choose  type  I today,  what  is  the  probability  that  it  will  choose  type 
I tomorrow? 

15.  Suppose  that  at  some  initial  point  in  time  100,000  people  live  in  a certain  city  and  25,000  people  live  in  its  suburbs. 
The  Regional  Planning  Commission  determines  that  each  year  5%  of  the  city  population  moves  to  the  suburbs  and 
3%  of  the  suburban  population  moves  to  the  city. 

(a)  Assuming  that  the  total  population  remains  constant,  make  a table  that  shows  the  populations  of  the  city  and  its 
suburbs  over  a five-year  period  (round  to  the  nearest  integer). 

(b)  Over  the  long  term,  how  will  the  population  be  distributed  between  the  city  and  its  suburbs? 

Answer: 


Suburbs 


78,125 


16.  Suppose  that  two  competing  television  stations,  station  1 and  station  2,  each  have  50%  of  the  viewer  market  at  some 
initial  point  in  time.  Assume  that  over  each  one-year  period  station  1 captures  5%  of  station  2’s  market  share  and 
station  2 captures  10%  of  station  l’s  market  share. 

(a)  Make  a table  that  shows  the  market  share  of  each  station  over  a five-year  period. 

(b)  Over  the  long  term,  how  will  the  market  share  be  distributed  between  the  two  stations? 

17.  Suppose  that  a car  rental  agency  has  three  locations,  numbered  1,  2,  and  3.  A customer  may  rent  a car  from  any  of 
the  three  locations  and  return  it  to  any  of  the  three  locations.  Records  show  that  cars  are  rented  and  returned  in 
accordance  with  the  following  probabilities: 


Rented  from  Location 


1 


Returned  to  Location  2 


3 


12  3 


1 

10 

1 

5 

3 

5 

4 

3 

1 

5 

10 

5 

1 

1 

1 

10 

2 

5 

(a)  Assuming  that  a car  is  rented  from  location  1,  what  is  the  probability  that  it  will  be  at  location  1 after  two 
rentals? 

(b)  Assuming  that  this  dynamical  system  can  be  modeled  as  a Markov  chain,  find  the  steady-state  vector. 

(c)  If  the  rental  agency  owns  120  cars,  how  many  parking  spaces  should  it  allocate  at  each  location  to  be 
reasonably  certain  that  it  will  have  enough  spaces  for  the  cars  over  the  long  term?  Explain  your  reasoning. 


Answer: 


(a) 

(b) 


23 
100 
' 46 
159 
22 
53 
47 
159 


(c)  35,50,35 


18.  Physical  traits  are  determined  by  the  genes  that  an  offspring  receives  from  its  parents.  In  the  simplest  case  a trait  in 
the  offspring  is  determined  by  one  pair  of  genes,  one  member  of  the  pair  inherited  from  the  male  parent  and  the 
other  from  the  female  parent.  Typically,  each  gene  in  a pair  can  assume  one  of  two  forms,  called  alleles,  denoted  by 
A and  a.  This  leads  to  three  possible  pairings: 

AA,  Aa,  aa 

called  genotypes  (the  pairs  Aa  and  aA  determine  the  same  trait  and  hence  are  not  distinguished  from  one  another).  It 
is  shown  in  the  study  of  heredity  that  if  a parent  of  known  genotype  is  crossed  with  a random  parent  of  unknown 
genotype,  then  the  offspring  will  have  the  genotype  probabilities  given  in  the  following  table,  which  can  be  viewed 
as  a transition  matrix  for  a Markov  process: 


Genotype  of  Parent 


AA 

Genotype  of  Offspring  Aa 

aa 

Thus,  for  example,  the  offspring  of  a parent  of  genotype  AA  that  is  crossed  at  random  with  a parent  of  unknown 
genotype  will  have  a 50%  chance  of  being  AA,  a 50%  chance  of  being  Aa,  and  no  chance  of  being  aa. 

(a)  Show  that  the  transition  matrix  is  regular. 

(b)  Find  the  steady-state  vector,  and  discuss  its  physical  interpretation. 


AA  Aa  aa 


1 

2 

1 

4 

0 

1 

1 

1 

2 

2 

2 

0 

1 

4 

1 

2 

19.  Fill  in  the  missing  entries  of  the  stochastic  matrix 


7 

* 

1 

10 

5 

* 

3 

* 

10 

1 

3 

3 

10 

5 

10 

and  find  its  steady-state  vector. 


Answer: 


’7  1 1 ' 

V 

10  10  5 

3 

1 3 1 

1 

5 10  2 

; q = 

3 

1 3 3 

1 

10  5 10 

3 

20.  If  P is  an  n x n stochastic  matrix,  and  if  M is  a 1 x n matrix  whose  entries  are  all  l's,  then  MP  = 

21.  If  P is  a regular  stochastic  matrix  with  steady-state  vector  q,  what  can  you  say  about  the  sequence  of  products 

Pq,  P2  q,  P3q,...,  P\... 


as  k — ► oo- 


Answer: 

P*q  = q for  every  positive  integer  k 

22-  (a)  If  P is  a regular  n x n stochastic  matrix  with  steady-state  vector  q,  and  if  e \ , e2, . . en  are  the  standard  unit 
vectors  in  column  form,  what  can  you  say  about  the  behavior  of  the  sequence 

Pe„  P2e,-(  P3e,- Pkeu... 

as  k — ► do  f°r  each  i = 1 , 2 

(b)  What  does  this  tell  you  about  the  behavior  of  the  column  vectors  of  as  fc  — ► qq? 

23.  Prove  that  the  product  of  two  stochastic  matrices  is  a stochastic  matrix.  [Hint:  Write  each  column  of  the  product  as 
a linear  combination  of  the  columns  of  the  first  factor. 


24.  Prove  that  if  P is  a stochastic  matrix  whose  entries  are  all  greater  than  or  equal  to  p , then  the  entries  of  p2  are 
greater  than  or  equal  to  p. 

True-False  Exercises 


In  parts  (a)-(e)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  [1 

3 

The  vector  0 is  a probability  vector. 

2 

3 

Answer: 


True 


(b) 


The  matrix 


0.2  T 
0.8  0 


is  a regular  stochastic  matrix. 


Answer: 

True 

(c)  The  column  vectors  of  a transition  matrix  are  probability  vectors. 

Answer: 

True 

(d)  A steady-state  vector  for  a Markov  chain  with  transition  matrix  P is  any  solution  of  the  linear  system  (1  — P)  q = 0. 
Answer: 

False 

(e)  The  square  of  every  regular  stochastic  matrix  is  stochastic. 

Answer: 

True 
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Supplementary  Exercises 


1.  Let  Vbe  the  set  of  all  ordered  pairs  of  real  numbers,  and  consider  the  following  addition  and  scalar 
multiplication  operations  on  u = (u\,  U2,  ^3)  and  v = (vj,  V2,  V3): 

u f v = («!  U2  +V2,  U2  + V3),  £u=  (£111,  0,  0) 

(a)  Compute  u + v and  fcu  for  u = (3,  — 2,  4),  v = (1,  5,  — 2),  and  k = — 1- 

(b)  In  words,  explain  why  V is  closed  under  addition  and  scalar  multiplication. 

(c)  Since  the  addition  operation  on  V is  the  standard  addition  operation  on  py,  certain  vector  space  axioms 
hold  for  V because  they  are  known  to  hold  for  Which  axioms  in  Definition  1 of  Section  4. 1 are 
they? 

(d)  Show  that  Axioms  7,  8,  and  9 hold. 

(e)  Show  that  Axiom  10  fails  for  the  given  operations. 

Answer: 

(a)  u + v = (4,  3,  2),  — u = ( — 3,  0,  0) 

(c)  Axioms  1-5 

2.  In  each  part,  the  solution  space  of  the  system  is  a subspace  of  /?-'  and  so  must  be  a line  through  the  origin, 
a plane  through  the  origin,  all  of  R-\  or  the  origin  only.  For  each  system,  determine  which  is  the  case.  If 
the  subspace  is  a plane,  find  an  equation  for  it,  and  if  it  is  a line,  find  parametric  equations. 


(a) 

0 

+ 

$ 

+ 

II 

0 

(b) 

2x  — 3y  + z = 

0 

6x  — 9y  + 3z  = 

0 

— Ax  + 6y  — 2z  = 

= 0 

(c) 

x — 2y  -\-lz  = 

0 

— 4x  + Sy-b5z  — 

= 0 

2x  — 4y  -F  3z  = 

0 

(d) 

x + 4y  + 8z  = 0 

2x  4=  5y  + 6z  = 0 

3x+  y - 4z  = 0 

3.  For  what  values  of  s is  the  solution  space  of 

*1  + *2  + SX2  = 0 
x\  +SX2  + X2  = 0 
sxq  + X2  4=  X2  = 0 

the  origin  only,  a line  through  the  origin,  a plane  through  the  origin,  or  all  of/?3? 


Answer: 


If  s * 1 , — 2,  the  solution  space  is  the  origin.  If  s = 1 , the  solution  space  is  a plane  through  the  origin.  If 
5 = — 2,  the  solution  space  is  a line  through  the  origin. 

4*  (a)  Express  (4a,  a — b,a  + 2b)  as  a linear  combination  of  (4,  1,  1)  and  (0,  —1,2). 

(b)  Express  (3 a + b + 3 c,  — a + 4b  — c,  2a4-b  4-  2c)  as  a linear  combination  of  (3,  — 1 , 2)  and 
(1,4,1). 

(c)  Express  (2 a — b + 4c,  3a—  c,  4b  + c)  as  a linear  combination  of  three  nonzero  vectors. 

5.  Let  IF  be  the  space  spanned  by  f = sin  x and  S = cos  x- 

(a)  Show  that  for  any  value  of  9,  f \ = sin(x  4-  9)  and  gi  = cos(x  4 9)  are  vectors  in  W. 

(b)  Show  that  f j and  gl  form  a basis  for  W. 

(a)  Express  v = (1,  1)  as  a linear  combination  of  vj  = ( 1 , — 1 ) , V2  = (3,  0) , and  V3  = (2,  1 ) in  two 
different  ways. 

(b)  Explain  why  this  does  not  violate  Theorem  4.4.1. 

7.  Let  A be  an  ^ x n matrix,  and  let  vj , V2, . . .,  v„  be  linearly  independent  vectors  in  Rn  expressed  as  ^ x 1 
matrices.  What  must  be  true  about  A for  Av\,  Avj, . . .,  A\n  to  be  linearly  independent? 

Answer: 

A must  be  invertible 

8.  Must  a basis  for  Pn  contain  a polynomial  of  degree  k for  each  £ = 0,  1,2 nl  Justify  your  answer. 

9.  For  the  purpose  of  this  exercise,  let  us  define  a “checkerboard  matrix”  to  be  a square  matrix  A = [<aty  ] 
such  that 

(1  if  i + j is  even 
0 if  i + j is  odd 

Find  the  rank  and  nullity  of  the  following  checkerboard  matrices. 

(a)  The  3x3  checkerboard  matrix. 

(b)  The  4x4  checkerboard  matrix. 

(c)  The  nxn  checkerboard  matrix. 

Answer: 

(a)  Rank  = 2,  nullity  = 1 
(b)  Rank  = 2,  nullity  = 2 

(c)  Rank  = 2,  nullity  = n — 2 

10.  For  the  purpose  of  this  exercise,  let  us  define  an  “X-matrix”  to  be  a square  matrix  with  an  odd  number  of 
rows  and  columns  that  has  O' s everywhere  except  on  the  two  diagonals  where  it  has  l's.  Find  the  rank  and 
nullity  of  the  following  X-matrices. 

(a)  I"  1 0 r 
0 1 0 
1 0 1 


(b)  Tl  0 0 0 f 

0 10  10 
0 0 10  0 
0 10  10 
1 0 0 0 1 

(c)  theX-matrix  of  size  (2«  + 1)  x (2n  +1) 

11.  In  each  part,  show  that  the  stated  set  of  polynomials  is  a subspace  of  Pn  and  find  a basis  for  it. 

(a)  All  polynomials  in  Pn  such  that  p(  — x)  = p(x). 

(b)  All  polynomials  in  Pn  such  that  /?(0)  = 0. 

Answer: 

(a)  | 1,  x“,  x ^ , ...,  x~  " Js  where  2m  = « if  « is  even  and  2m  =n  — 1 if « is  odd. 

(b)  jx,  x2,  x3,  .... 

12.  ( Calculus  required)  Show  that  the  set  of  all  polynomials  in  Pn  that  have  a horizontal  tangent  at  * = 0 is  a 
subspace  of  Pn.  Find  a basis  for  this  subspace. 

13*  (a)  Find  a basis  for  the  vector  space  of  all  3 x 3 symmetric  matrices. 

(b)  Find  a basis  for  the  vector  space  of  all  3 x 3 skew-symmetric  matrices. 

Answer: 

(a)  rri  0 0]  [0  1 0]  To  0 1]  To  0 0]  [0  0 0]  [0  0 ol] 

^ 0 0 0,  100,  000,  010,  001,  0 0 0 
[[o  0 oj  [o  0 oj  [l  0 oj  L°  0 °J  1°  1 oj  L°  0 ]JJ 

(b)  [[  o i ol  r o o l]  To  o ol] 

^-100,  000,0  oil 

[|_  0 0 oj  [-1  0 oj  |_0  =1  oJJ 

14.  Various  advanced  texts  in  linear  algebra  prove  the  following  determinant  criterion  for  rank:  The  rank  of  a 
matrix  A is  r if  and  only  if A has  some  rxr  submatrix  with  a nonzero  determinant,  and  all  square 
submatrices  of  larger  size  have  determinant  zero.  [Note:  A submatrix  of  A is  any  matrix  obtained  by 
deleting  rows  or  columns  of  A.  The  matrix  A itself  is  also  considered  to  be  a submatrix  of  A.]  In  each  part, 
use  this  criterion  to  find  the  rank  of  the  matrix. 

(a)  [1  2 0" 

_2  4 -1_ 

(b)  f 1 2 3' 

2 4 6_ 

(e)  1 0 f 

2- 13 

3- 14 


(d)iff.  1-12  0' 

3 10  0 

-1  2 4 0 

15.  Use  the  result  in  Exercise  14  above  to  find  the  possible  ranks  for  matrices  of  the  form 

0 0 0 0 0 a 

0 0 0 0 0 <326 

0 0 0 0 0 <336 

0 0 0 0 0 <346 

<351  <*52  <*53  <354  <*56 

Answer: 

Possible  ranks  are  2,  1,  and  0. 

16.  Prove:  If  S is  a basis  for  a vector  space  y,  then  for  any  vectors  u and  y in  l ' and  any  scalar  k,  the  following 
relationships  hold. 

(a)  (u  + v)^=(u)5+(v)5 

(b)  (^)5'=jt(u)^ 


Copyright  © 2010  John  Wiley  & Sons,  Inc.  All  rights  reserved. 


| CHAPTER  | 


Eigenvalues  and 


Eigenvectors 


CHAPTER  CONTENTS 


Eigenvalues  and  Eigenvectors 
Diagonalization 
Complex  Vector  Spaces 
Differential  Equations 


INTRODUCTION 

In  this  chapter  we  will  focus  on  classes  of  scalars  and  vectors  known  as  “eigenvalues”  and 
“eigenvectors,”  terms  derived  from  the  German  word  eigen,  meaning  “own,”  “peculiar 
to,”  “characteristic,”  or  “individual.”  The  underlying  idea  first  appeared  in  the  study  of 
rotational  motion  but  was  later  used  to  classify  various  kinds  of  surfaces  and  to  describe 
solutions  of  certain  differential  equations.  In  the  early  1900s  it  was  applied  to  matrices  and 
matrix  transformations,  and  today  it  has  applications  in  such  diverse  fields  as  computer 
graphics,  mechanical  vibrations,  heat  flow,  population  dynamics,  quantum  mechanics,  and 
economics  to  name  just  a few. 
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5.1  Eigenvalues  and  Eigenvectors 

In  this  section  we  will  define  the  notions  of  “eigenvalue”  and  “eigenvector”  and  discuss  some  of  their  basic 
properties. 


Definition  of  Eigenvalue  and  Eigenvector 

We  begin  with  the  main  definition  in  this  section. 


DEFINITION  1 

If  A is  an  ^ x n matrix,  then  a nonzero  vector  x in  R*  is  called  an  eigenvector  of  A (or  of  the  matrix 
operator  T if  Ax  is  a scalar  multiple  of  x;  that  is, 

Ax  = Ax 

for  some  scalar  ,\-  The  scalar  \ is  called  an  eigenvalue  of  A (or  of  T j\),  and  x is  said  to  be  an 
eigenvector  corresponding  to  \. 


The  requirement  that  an  eigenvector  be 
nonzero  is  imposed  to  avoid  the  unimportant 
case  AO  = AO?  which  holds  for  every  A and  \ 


In  general,  the  image  of  a vector  x under  multiplication  by  a square  matrix  A differs  from  x in  both  magnitude 
and  direction.  However,  in  the  special  case  where  x is  an  eigenvector  of  A,  multiplication  by  A leaves  the 
direction  unchanged.  For  example,  in  r}  or  R-'  multiplication  by  A maps  each  eigenvector  x of  A (if  any) 
along  the  same  line  through  the  origin  as  x.  Depending  on  the  sign  and  magnitude  of  the  eigenvalue  \ 
corresponding  to  x,  the  operation  Ax  = Ax  compresses  or  stretches  x by  a factor  of  A,  with  a reversal  of 
direction  in  the  case  where  \ is  negative  (Figure  5.1.1). 


Figure  5.1.1 


◄ 


EXAMPLE  1 


Eigenvector  of  a 2 * 2 Matrix 


The  vector  x = 


is  an  eigenvector  of 


A = 


3 0 

8 -1 


corresponding  to  the  eigenvalue  = 3,  since 

Ax  = 


'3 

O' 

T 

"3" 

_8 

-1_ 

_2_ 

_6_ 

= 3x 


Geometrically,  multiplication  by  A has  stretched  the  vector  x by  a factor  of  3 (Figure  5.1 .2). 


Computing  Eigenvalues  and  Eigenvectors 

Our  next  objective  is  to  obtain  a general  procedure  for  finding  eigenvalues  and  eigenvectors  of  an  n x n 
matrix  A.  We  will  begin  with  the  problem  of  finding  the  eigenvalues  of  A.  Note  first  that  the  equation 
Ax  = Ax  can  be  rewritten  as  Ax  = A/x-  or  equivalently,  as 

(XI  -A)x  = 0 

For  ,\  to  be  an  eigenvalue  of  A this  equation  must  have  a nonzero  solution  for  x.  But  it  follows  from  parts  (b) 
and  (g)  of  Theorem  4.10.4  that  this  is  so  if  and  only  if  the  coefficient  matrix  XI  — A bas  a zero  determinant. 
Thus,  we  have  the  following  result. 


THEOREM  5.1.1 

If  A is  an  ^ x n matrix,  then  \ is  an  eigenvalue  of  A if  and  only  if  it  satisfies  the  equation 

det(A/  — ^4)  = 0 (1) 


This  is  called  the  characteristic  equation  of  A. 


EXAMPLE  2 Finding  Eigenvalues 


In  Example  1 we  observed  that  \ = 3 is  an  eigenvalue  of  the  matrix 


but  we  did  not  explain  how  we  found  it.  Use  the  characteristic  equation  to  find  all  eigenvalues 
of  this  matrix. 


It  follows  from  Formula  1 that  the 
det(A/  — A)  = 0,  which  we  can  write  as 

A — 3 

-8 

from  which  we  obtain 

(A-3)(A+1)  = 0 (2) 

This  shows  that  the  eigenvalues  of  A are  A = 3 and  A = — 1 • Thus,  in  addition  to  the 
eigenvalue  A = 3 noted  in  Example  1 , we  have  discovered  a second  eigenvalue  \ = — \ . 


eigenvalues  of  A are  the  solutions  of  the  equation 


0 

A -E  1 


= 0 


When  the  determinant  det(A/  — A)  that  appears  on  the  left  side  of  1 is  expanded,  the  result  is  a polynomial 
p (A)  of  degree  n that  is  called  the  characteristic  polynomial  of  A.  For  example,  it  follows  from  2 that  the 
characteristic  polynomial  of  the  2x2  matrix  A in  Example  2 is 

£>(A)  = (A  — 3)(A+  1)  = A2  — 2A  — 3 

which  is  a polynomial  of  degree  2.  In  general,  the  characteristic  polynomial  of  an  n x n matrix  has  the  form 

£>(A)  = AM  + cjA”  ^+...  + cm 

in  which  the  coefficient  of  AM  is  1 (Exercise  17).  Since  a polynomial  of  degree  n has  at  most  n distinct  roots,  it 
follows  that  the  equation 


AM  + ciAM  ^+...  + cM  = 0 (3) 

has  at  most  n distinct  solutions  and  consequently  that  an  matrix  has  at  most  n distinct  eigenvalues.  Since 
some  of  these  solutions  may  be  complex  numbers,  it  is  possible  for  a matrix  to  have  complex  eigenvalues, 
even  if  that  matrix  itself  has  real  entries.  We  will  discuss  this  issue  in  more  detail  later,  but  for  now  we  will 
focus  on  examples  in  which  the  eigenvalues  are  real  numbers. 

EXAMPLE  3 Eigenvalues  of  a 3 x 3 Matrix 


A = 


0 

0 

4 


1 0 
0 1 
-17  8 


Find  the  eigenvalues  of 


= A3  — 8A2  + 17A  — 4 


The  characteristic  polynomial  of  A is 


det(A/  — A)  — det 


A 

0 

-4 


-1  0 
A -1 
17  A — 8 


The  eigenvalues  of  A must  therefore  satisfy  the  cubic  equation 


A3  — 8A2  4-  17A  — 4 = 0 


(4) 


To  solve  this  equation,  we  will  begin  by  searching  for  integer  solutions.  This  task  can  be 
simplified  by  exploiting  the  fact  that  all  integer  solutions  (if  there  are  any)  of  a polynomial 
equation  with  integer  coefficients 

Aw  + ^ + ...  + cn  = 0 

In  applications  involving  large  matrices 
it  is  often  not  feasible  to  compute  the 
characteristic  equation  directly  so  other 
methods  must  be  used  to  find 
eigenvalues.  We  will  consider  such 
methods  in  Chapter  9. 

must  be  divisors  of  the  constant  term,  cn.  Thus,  the  only  possible  integer  solutions  of  4 are  the 
divisors  of  _4,  that  is,  ±1,  ± 2?  ±4-  Successively  substituting  these  values  in  4 shows  that 
A = 4 is  an  integer  solution.  As  a consequence,  A — 4 must  be  a factor  of  the  left  side  of  4. 
Dividing  A - 4 into*3  - 8A2  + 17A  - 4 shows  that  4 can  be  rewritten  as 

(A-4)(A2-4A+lJ  = 0 

Thus,  the  remaining  solutions  of  4 satisfy  the  quadratic  equation 

A2  — 4A  + 1 = 0 

which  can  be  solved  by  the  quadratic  formula.  Thus  the  eigenvalues  of  A are 

A = 4,  A = 2 -I-  \[i,  and  A = 2 — ^3 


EXAMPLE  4 Eigenvalues  of  an  Upper  Triangular  Matrix 


Find  the  eigenvalues  of  the  upper  triangular 

matrix 

~a  11 

«12 

al3 

<314 

0 

«22 

a23 

a24 

A = 

0 

0 

<*  33 

a 34 

0 

0 

0 

a44 

Recalling  that  the  determinant  of  a triangular  matrix  is  the  product  of  the  entries  on 
the  main  diagonal  (Theorem  2.1.2),  we  obtain 


det(A l -A) 


-<2H 

-a  12 

-<213 

-<2 14 

0 

> 

1 

& 

to 

to 

-<223 

-a  24 

0 

0 

A-<233 

-a  34 

0 

0 

0 

A — <244 

= (A  — a 1 1 ) (A  — <z  22)  (A  — a33)  (A  — <244) 


Thus,  the  characteristic  equation  is 

(A  — a 1 1 ) (A  — <222)  (A  — <233)  (A  — 1344)  = 0 

and  the  eigenvalues  are 

A = a 11,  A = «22.  A = i233,  A = <244 

which  are  precisely  the  diagonal  entries  of  A. 


The  following  general  theorem  should  be  evident  from  the  computations  in  the  preceding  example. 


THEOREM  5.1.2 

If  A is  an  ^ x n triangular  matrix  (upper  triangular,  lower  triangular,  or  diagonal),  then  the  eigenvalues 
of  A are  the  entries  on  the  main  diagonal  of  A. 


EXAMPLE  5 Eigenvalues  of  a Lower  Triangular  Matrix 


By  inspection,  the  eigenvalues  of  the  lower  triangular  matrix 


A = 


1 

2 

-1 


5 


0 

2 

3 

-8 


0 

0 

1 

4 


are  A 


1 

2’ 


A=  j,  and  A = 


1 

4' 


Had  Theorem  5.1.2  been  available  earlier,  we 
could  have  anticipated  the  result  obtained  in 
Example  2. 


THEOREM  5.1.3 


If  A is  an  ^ x n matrix,  the  following  statements  are  equivalent. 

(a)  \ is  an  eigenvalue  of  A. 

(b)  The  system  of  equations  (A l — A)x  = 0 has  nontrivial  solutions. 

(c)  There  is  a nonzero  vector  x such  that  Ax  = Ax 

(d)  A is  a solution  of  the  characteristic  equation  det(A/  — A)  = 0 


Finding  Eigenvectors  and  Bases  for  Eigenspaces 

Now  that  we  know  how  to  find  the  eigenvalues  of  a matrix,  we  will  consider  the  problem  of  finding  the 
corresponding  eigenvectors.  Since  the  eigenvectors  corresponding  to  an  eigenvalue  \ of  a matrix^  are  the 
nonzero  vectors  that  satisfy  the  equation 

(XI-A)x  = 0 

these  eigenvectors  are  the  nonzero  vectors  in  the  null  space  of  the  matrix  XI  _ J[.  We  call  this  null  space  the 
eigenspace  of  A corresponding  to  Stated  another  way,  the  eigenspace  of  A corresponding  to  the  eigenvalue 

X is  the  solution  space  of  the  homogeneous  system  (AI  — ^4)x  = 0. 

Notice  that  x = 0 is  in  every  eigenspace  even 
though  it  is  not  an  eigenvector.  Thus,  it  is  the 
nonzero  vectors  in  an  eigenspace  that  are  the 
eigenvectors. 


EXAMPLE  6 Bases  for  Eigenspaces 


Find  bases  for  the  eigenspaces  of  the  matrix 


A = 


8 -1 


In  Example  1 we  found  the  characteristic  equation  of  A to  be 

(A  — 3)(A+  1)  = 0 

from  which  we  obtained  the  eigenvalues  A = 3 and  A = — 1 • Thus,  there  are  two  eigenspaces 
of  A,  one  corresponding  to  each  of  these  eigenvalues. 


By  definition. 


is  an  eigenvector  of  A corresponding  to  an  eigenvalue  \ if  and  only  if  x is  a nontrivial  solution 
of  (A I — A)x  = 0,  that  is,  of 


If  \ = 3,  then  this  equation  becomes 


A — 3 

0 

'*r 

'o' 

-8 

A+l_ 

_x2_ 

_0_ 

0 

O' 

■*r 

"0" 

8 

4 _ 

*2 

_0_ 

whose  general  solution  is 
(verify)  or  in  matrix  form, 


xl  = 2 1>  x2  = t 


-XX 

\h] 

" 1 " 

_x2_ 

2 

= t 

2 

t 

1 

Thus, 

1 
2 
1 

is  a basis  for  the  eigenspace  corresponding  to  A = 3-  We  leave  it  as  an  exercise  for  you  to 
follow  the  pattern  of  these  computations  and  show  that 

O' 

1 

is  a basis  for  the  eigenspace  corresponding  to  A = — 1 • 


Methods  of  linear  algebra  are  used  in  the  emerging  field  of  computerized  face 
recognition.  Researchers  are  working  with  the  idea  that  every  human  face  in  a racial  group  is  a 
combination  of  a few  dozen  primary  shapes.  For  example,  by  analyzing  three-dimensional  scans  of 
many  faces,  researchers  at  Rockefeller  University  have  produced  both  an  average  head  shape  in  the 


Caucasian  group — dubbed  the  meanhead  (top  row  left  in  the  figure  to  the  left) — and  a set  of 
standardized  variations  from  that  shape,  called  eigenheads  (15  of  which  are  shown  in  the  picture). 
These  are  so  named  because  they  are  eigenvectors  of  a certain  matrix  that  stores  digitized  facial 
information.  Face  shapes  are  represented  mathematically  as  linear  combinations  of  the  eigenheads. 
[Image:  Courtesy  Dr.  Joseph  Atick,  Dr  Norman  Redlich,  and  Dr  Paul  Griffith ] 


EXAMPLE  7 Eigenvectors  and  Bases  for  Eigenspaces 


Find  bases  for  the  eigenspaces  of 


A = 


0 0-2 

1 2 1 

1 0 3 


The  characteristic  equation  of  A is  A^  — 5A^  I 8A  — 4 = 0-  or  m factored  form, 

(A  — 1)  (A  — 2)  = 0 (verify).  Thus,  the  distinct  eigenvalues  of  A are  A = 1 and  A = 2>  so  there 
are  two  eigenspaces  of  A. 


By  definition, 


x = 


*1 

*2 

*3 


is  an  eigenvector  of  A corresponding  to  A if  and  only  if  x is  a nontrivial  solution  of 
(A I — j4)x  = 0,  or  in  matrix  form, 


A 

0 

2 

■*r 

'O' 

-1 

A—  2 

-1 

*2 

= 

0 

-1 

0 

A — 3 

*3 

0 

(5) 


In  the  case  where  A = 2-  Formula  5 becomes 


2 0 2' 

"*r 

"0" 

-1  0 -1 

*2 

= 

0 

-1  0 -1 

*3 

0 

Solving  this  system  using  Gaussian  elimination  yields  (verify) 

x 1 = -s,  X2=t,  xi=s 

Thus,  the  eigenvectors  of  A corresponding  to  A = 2 are  the  nonzero  vectors  of  the  form 


" — s' 

'o' 

'-f 

'o' 

X = 

t 

— 

0 

4- 

t 

= S 

0 

+ / 

1 

S 

s 

0 

1 

0 

Since 


"-r 

'O' 

0 

and 

1 

1 

0 

are  linearly  independent  (why?),  these  vectors  form  a basis  for  the  eigenspace  corresponding  to 

A = 2- 

If ,\  = I,  then  5 becomes 


1 

0 

2' 

"0" 

-1 

-1 

-1 

*2 

= 

0 

-1 

0 

-2 

*3 

0 

Solving  this  system  yields  (verify) 

*1  = -2s,  X2=s,  X2  = s 

Thus,  the  eigenvectors  corresponding  to  \ = 1 are  the  nonzero  vectors  of  the  form 


—2s 

S 

= s 

-2' 

1 

so  that 

-2' 

1 

s 

1 

1 

is  a basis  for  the  eigenspace  corresponding  to  A = 1 • 


Powers  of  a Matrix 

Once  the  eigenvalues  and  eigenvectors  of  a matrix  A are  found,  it  is  a simple  matter  to  find  the  eigenvalues 
and  eigenvectors  of  any  positive  integer  power  of  A;  for  example,  if  A is  an  eigenvalue  of  A and  x is  a 
corresponding  eigenvector,  then 

A^x  = A(Ax)  = A(Ax)  = A (Ax)  = A (Ax)  = A2x 

which  shows  that  A2  is  an  eigenvalue  of  A1  and  that  x is  a corresponding  eigenvector.  In  general,  we  have  the 
following  result. 


THEOREM  5.1.4 

If  & is  a positive  integer,  A is  an  eigenvalue  of  a matrix  A,  and  x is  a corresponding  eigenvector,  then 
A*  is  an  eigenvalue  of  and  x is  a corresponding  eigenvector. 


EXAMPLE  8 Powers  of  a Matrix 


In  Example  7 we  showed  that  the  eigenvalues  of 

, 0 0 

A=' 


-2 

1 

3 


are  A = 2 and  ,\  = ] , so  from  Theorem  5.1.4  both  \ — 2’  = 128  and  ,\  = ] 7 = 1 are  eigenvalues  of 
A 7 • We  also  showed  that 


-1' 

'O' 

0 

and 

1 

1 

0 

are  eigenvectors  of  4 corresponding  to  the  eigenvalue  ,\  = 2,  so  from  Theorem  5.1.4  they  are  also 
eigenvectors  of  A1  corresponding  to  \ = 2 7 = 128-  Similarly,  the  eigenvector 


-2 

1 

1 

of  A corresponding  to  the  eigenvalue  ^ ] is  also  an  eigenvector  of  A y corresponding  to 

A=l7  = l- 


Eigenvalues  and  Invertibility 

The  next  theorem  establishes  a relationship  between  eigenvalues  and  the  invertibility  of  a matrix. 


THEOREM  5.1.5 

A square  matrix  A is  invertible  if  and  only  if  A = 0 is  not  an  eigenvalue  of  A. 


Assume  that  A is  an  n x n matrix  and  observe  first  that  \ = Q is  a solution  of  the  characteristic 
equation 


AM  + c\ AM  * + ...  + cn  = 0 

if  and  only  if  the  constant  term  cn  is  zero.  Thus,  it  suffices  to  prove  that  A is  invertible  if  and  only  if  cn  * 0. 
But 

det(A/  — .d)  = AM  + ciAM  + 

or,  on  setting  A = 0? 

det(  — A)  = cn  or  ( — 1)”  det  (-d)  =cn 


It  follows  from  the  last  equation  that  det(24)  = 0 if  and  only  if  cn  = 0,  and  this  in  turn  implies  that  A is 
invertible  if  and  only  if  cn  * 0. 


EXAMPLE  9 Eigenvalues  and  Invertibility 

The  matrix  A in  Example  7 is  invertible  since  it  has  eigenvalues  A = 1 and  A = 2-  neither  of  which 
is  zero.  We  leave  it  for  you  to  check  this  conclusion  by  showing  that  det(zl)  * 0. 


More  on  the  Equivalence  Theorem 

As  our  final  result  in  this  section,  we  will  use  Theorem  5.1.5  to  add  one  additional  part  to  Theorem  4.10.4. 


Equivalent  Statements 

If  A is  an  ^ x n matrix,  then  the  following  statements  are  equivalent. 

(a)  A is  invertible. 

(b)  Ax  = 0 has  only  the  trivial  solution. 

(c)  The  reduced  row  echelon  form  of  A is  ln. 

(d)  A is  expressible  as  a product  of  elementary  matrices. 

(e)  = b is  consistent  for  every  ^ x 1 matrix  b. 

(f)  Ax  = b has  exactly  one  solution  for  every  n x 1 matrix  b. 

(g)  det(zl)  * 0. 

(h)  The  column  vectors  of  are  linearly  independent. 

(i)  The  row  vectors  of  A are  linearly  independent. 

(j)  The  column  vectors  of  A span  Rn. 

(k)  The  row  vectors  of  A span  Rn. 

(l)  The  column  vectors  of  A form  a basis  for  Rn. 

(m)  The  row  vectors  of  A form  a basis  for  Rn. 

(n)  A has  rank  n- 

(o)  A has  nullity  0- 

(p)  The  orthogonal  complement  of  the  null  space  of  A is  Rn. 

(q)  The  orthogonal  complement  of  the  row  space  of  A is  {0 } . 

(r)  The  range  of  T j\  is  Rn. 

(s)  T is  one-to-one. 


(t)  \ = 0 is  not  an  eigenvalue  of  A. 


This  theorem  relates  all  of  the  major  topics  we  have  studied  thus  far. 


Concept  Review 

Eigenvector 

Eigenvalue 

Characteristic  equation 
Characteristic  polynomial 
Eigenspace 
Equivalence  Theorem 

Skills 

Find  the  eigenvalues  of  a matrix. 

Find  bases  for  the  eigenspaces  of  a matrix. 


Exercise  Set  5.1 


In  Exercises  1-2,  confirm  by  multiplication  that  x is  an  eigenvector  of  A,  and  find  the  corresponding 
eigenvalue. 


1. 

i 

O 

V 

A = 

2 3 2 

; x = 

2 

1 0 4 

1 

Answer: 


5 


2. 

2 -i  -r 

V 

A = 

-1  2 -1 

; x = 

1 

-1  -1  2 

1 

3.  Find  the  characteristic  equations  of  the  following  matrices: 


(a) 


(b) 


3 0 

8 -1 

10  -9 

4 -2 


(c) 

0 3 

A 0_ 

(d) 

'-2 

-7 

1 

2 

(e) 

'0  O' 

_0  0_ 

(f) 

"1  0" 

0 1 

Answer: 

(a)  A2  - 2A  - 3 = 0 

(b)  A2  — 8A+  16  = 0 

(c)  A2  - 12  = 0 

(d)  A2  + 3 = 0 

(e)  A2  = 0 

(f)  A2  — 2A  4-  1 = 0 

4.  Find  the  eigenvalues  of  the  matrices  in  Exercise  3 

5.  Find  bases  for  the  eigenspaces  of  the  matrices  in  Exercise  3 

Answer: 


(a) 


Basis  for  eigenspace  corresponding  to  A = 3 : 

A = -1 


(b) 


Basis  for  eigenspace  corresponding  to  A = 4 : 


; basis  for  eigenspace  corresponding  to 


(c) 


Basis  for  eigenspace  corresponding  to  A=/l2: 

3 


3 

f)2 

1 


; basis  for  eigenspace  corresponding  to 


A = - {u 


\[\2 


(d)  There  are  no  eigenspaces. 
v ’ Basis  for  eigenspace  corresponding  to  A = 

(f) 

v ’ Basis  for  eigenspace  corresponding  to  A = 


0: 

1: 


t 

"O' 

_0_ 

7 

_1_ 

T 

'O' 

_0_ 

7 

_1_ 

6.  Find  the  characteristic  equations  of  the  following  matrices: 


(a) 

4 0 1 
-2  1 0 
-2  0 1 

(b) 

3 

0 - 

■5' 

0 

L' 

1 - 

•2 

(c) 

r-2 

0 

1 

-6  -2 

0 

19 

5 

-4 

(d) 

'-1 

0 

r 

-1 

3 

0 

-4  13  - 

-1 

(e) 

5 0 1' 
1 1 0 
-7  1 0 

(f) 

5 

6 

2" 

0 -1 

8 

1 

0 - 

2 

7.  Find  the  eigenvalues  of  the  matrices  in  Exercise  6. 

Answer: 

(a)  1,2,3 

(b)  —\[l,  0,  {2 

(c)  "8 

(d)  2 

(e)  2 

(f)  ~4’3 

8.  Find  bases  for  the  eigenspaces  of  the  matrices  in  Exercise  6. 

9.  Find  the  characteristic  equations  of  the  following  matrices: 


0 0 

2 

0 

1 0 

1 

0 

0 1 

-2 

0 

0 0 

0 

1 

10  • 

-9 

0 

0 

4 ■ 

-2 

0 

0 

0 

0 • 

-2 

-7 

0 

0 

1 

2 

Answer: 


(a)  A4  + A3  — 3A2  — A -F  2 = 0 

(b)  A4  - 8A3  I 19A2  - 24A  ) 48  = 0 

10.  Find  the  eigenvalues  of  the  matrices  in  Exercise  9. 

11.  Find  bases  for  the  eigenspaces  of  the  matrices  in  Exercise  9. 

Answer: 


(a) 


A = 1 : basis 


2 

"o" 

3 

0 

1 

7 

0 

0 

1 

(b) 


A = 4:  basis 


3 

2 

1 

0 

0 


2:  basis 


-1 

0 

1 

0 


; A = 


1 : basis 


-2 

1 

1 

0 


12.  By  inspection,  find  the  eigenvalues  of  the  following  matrices: 


(a) 

(b) 


-1 

0 

3 

-2 

4 


6 

5_ 

0 0 

7 0 

8 1 


(c) 


1 

3 

0 


0 

0 


0 0 0 

0 0 
0 1 0 
0 o 1 


13.  Find  the  eigenvalues  of  fp  for 


1 


0 

0 


3 7 11 


0 0 4 

0 0 2 


Answer: 


14.  Find  the  eigenvalues  and  bases  for  the  eigenspaces  of  A1-’  for 


A = 


-1  -2  -2 

1 2 1 

-1  -1  0 


15.  Let  A be  a 2 x 2 matrix,  and  call  a line  through  the  origin  of  g}  invariant  under  A if  Ax  lies  on  the  line 
when  x does.  Find  equations  for  all  lines  in  p},  if  any,  that  are  invariant  under  the  given  matrix. 


(a)^= 

'4  -1 
_2  1 

0 1 

-1  0 

"2  3' 

_°  2_ 

Answer: 

(a)  y = x and  y : 

(b)  No  lines 

(c)  y = 0 


16.  Find  det(j4)  given  that  A has  p(X)  as  its  characteristic  polynomial. 

(a)  p(A)=A3-2A2  + A + 5 

(b)  p(X)  = A4  — A3  + 7 

[Hint:  See  the  proof  of  Theorem  5.1.5.] 

17.  Let  A be  an  n x n matrix. 

(a)  Prove  that  the  characteristic  polynomial  of  A has  degree  n. 

(b)  Prove  that  the  coefficient  of  A”  in  the  characteristic  polynomial  is  1. 

'j 

18.  Show  that  the  characteristic  equation  of  a 2 x 2 matrix  A can  be  expressed  as  A — tr(^4)  A -F  det(^4)  = 0, 
where  tr(j4)  is  the  trace  of  A. 

19.  Use  the  result  in  Exercise  18  to  show  that  if 

A = \a  b 

[c  d_ 

then  the  solutions  of  the  characteristic  equation  of  A are 

\=±[(a  + + W 

Use  this  result  to  show  that  A has 

(a)  two  distinct  real  eigenvalues  if  (a  — d)  + 4b  c > 0. 

(b)  two  repeated  real  eigenvalues  if  (a  — d)  4-  Abe  = 0. 

'j 

(c)  complex  conjugate  eigenvalues  if  (a  — d)  4-  Abe  < 0. 


20.  Let  A be  the  matrix  in  Exercise  19.  Show  that  if  b * 0>  then 

-b 

a-  A2 

are  eigenvectors  of  A that  correspond,  respectively,  to  the  eigenvalues 

Ai  = +d)  + {(^df  + 4bc 

and 

A2  = ^(<at  + £jf)  - /o-d?)2  + 4i>c 


xi  = 


-b 

cat  — Ai 


and  X2  = 


21.  Use  the  result  of  Exercise  18  to  prove  that  if  /?(A)  is  the  characteristic  polynomial  of  a 2 x 2 matrix  A, 
then  p (A)  = 0. 

22.  Prove:  If  a,  b,  c,  and  d are  integers  such  that  a + b = c | d,  then 

A-\a  b 
[c  d_ 

has  integer  eigenvalues — namely,  Aj  = a + b and  A2  =a—c. 

23.  Prove:  If  A is  an  eigenvalue  of  an  invertible  matrix  A,  and  x is  a corresponding  eigenvector,  then  1 / A is 
an  eigenvalue  of  A , and  x is  a corresponding  eigenvector. 


24.  Prove:  If  A is  an  eigenvalue  of  A,  x is  a corresponding  eigenvector,  and  s is  a scalar,  then  ,\  — $ is  an 
eigenvalue  of  _ $/,  and  x is  a corresponding  eigenvector. 


25.  Prove:  If  ,\  is  an  eigenvalue  of  A and  x is  a corresponding  eigenvector,  then  $A  is  an  eigenvalue  of  sA  for 
every  scalar  s,  and  x is  a corresponding  eigenvector. 


26.  Find  the  eigenvalues  and  bases  for  the  eigenspaces  of 


A = 


-2  2 3 
-2  3 2 
-4  2 5 


and  then  use  Exercises  23  and  24  to  find  the  eigenvalues  and  bases  for  the  eigenspaces  of 

(a)  A~l 

(b)  A -31 

(c)  A I 2/ 


(a)  Prove  that  if  A is  a square  matrix,  then  A and  A ^ have  the  same  eigenvalues.  [Hint:  Look  at  the 
characteristic  equationdet(A/  — A)  = 0.] 

(b)  Show  that  A and  A T need  not  have  the  same  eigenspaces.  [Hint:  Use  the  result  in  Exercise  20  to  find 
a 2 x 2 matrix  for  which  A and  A T have  different  eigenspaces.] 


28.  Suppose  that  the  characteristic  polynomial  of  some  matrix  A is  found  to  be 

2 3 

p(X)  = (A  — 1)  (A  — 3)  (A  — 4)  . In  each  part,  answer  the  question  and  explain  your  reasoning. 

(a)  What  is  the  size  of  ^4? 

(b)  Is  A invertible? 

(c)  How  many  eigenspaces  does  A have? 


29.  The  eigenvectors  that  we  have  been  studying  are  sometimes  called  right  eigenvectors  to  distinguish  them 
from  left  eigenvectors,  which  are  « x 1 column  matrices  x that  satisfy  the  equation  xJ  A = /jx1  for  some 

scalar  fi.  What  is  the  relationship,  if  any,  between  the  right  eigenvectors  and  corresponding  eigenvalues  \ 
of  A and  the  left  eigenvectors  and  corresponding  eigenvalues  /./-  of  A1 

True-False  Exercises 

In  parts  (a)-(g)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  If  A is  a square  matrix  and  /he  = Ax  for  some  nonzero  scalar  then  x is  an  eigenvector  of  A. 

Answer: 

False 

(b)  If  A is  an  eigenvalue  of  a matrix  A,  then  the  linear  system  (A/  — A)x  = 0 has  only  the  trivial  solution. 
Answer: 

False 

'y 

(c)  If  the  characteristic  polynomial  of  a matrix  A is  ;?(A)  = A + 1,  then  A is  invertible. 

Answer: 

True 

(d)  If  A is  an  eigenvalue  of  a matrix  A,  then  the  eigenspace  of  A corresponding  to  A is  the  set  of  eigenvectors 
of  A corresponding  to  \. 

Answer: 

False 

(e)  If  0 is  an  eigenvalue  of  a matrix  A,  then  A1  is  singular. 

Answer: 

True 

(I)  The  eigenvalues  of  a matrix  A are  the  same  as  the  eigenvalues  of  the  reduced  row  echelon  form  of  A. 
Answer: 

False 

(g)  If  0 is  an  eigenvalue  of  a matrix  A,  then  the  set  of  columns  of  A is  linearly  independent. 

Answer: 


False 
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5.2  Diagonalization 

In  this  section  we  will  be  concerned  with  the  problem  of  finding  a basis  for  Rn  that  consists  of  eigenvectors  of  an 
n x n matrix  A.  Such  bases  can  be  used  to  study  geometric  properties  of  A and  to  simplify  various  numerical 
computations.  These  bases  are  also  of  physical  significance  in  a wide  variety  of  applications,  some  of  which  will  be 
considered  later  in  this  text. 


The  Matrix  Diagonalization  Problem 

Our  first  objective  in  this  section  is  to  show  that  the  following  two  seemingly  different  problems  are  equivalent. 

Given  an  n x n matrix  A,  does  there  exist  an  invertible  matrix  P such  that  P~^AP  is  diagonal? 
Given  an  n x n matrix  A,  does  A have  n linearly  independent  eigenvectors? 


L 


Similarity 

The  matrix  product  P~^AP  that  appears  in  Problem  1 is  called  a similarity  transformation  of  the  matrix A.  Such 
products  are  important  in  the  study  of  eigenvectors  and  eigenvalues,  so  we  will  begin  with  some  terminology  about 
them. 


DEFINITION  1 

If  A and  B are  square  matrices,  then  we  say  that  B is  similar  to  A if  there  is  an  invertible  matrix  P such  that 
B = P~{AP- 


Note  that  if  B is  similar  to  A,  then  it  is  also  true  that  A is  similar  to  B , since  we  can  express  B as  B = Q by 
taking  Q = P . This  being  the  case,  we  will  usually  say  that  A and  B are  similar  matrices  if  either  is  similar  to 
the  other. 


Similarity  Invariants 

Similar  matrices  have  many  properties  in  common.  For  example,  if  p = P~^AP>  then  it  follows  that  A and  B have 
the  same  determinant,  since 


det(5) 


= det^P-1APj  = det(p~{  )det(^)det(P) 

= det(P)  det^)det^)  = det(-4) 

In  general,  any  property  that  is  shared  by  all  similar  matrices  is  called  a similarity  invariant  or  is  said  to  be 
invariant  under  similarity.  Table  1 lists  the  most  important  similarity  invariants.  The  proofs  of  some  of  these 
results  are  given  as  exercises. 


Similarity  Invariants 


Property 

Description 

Determinant 

A and  P~^AP  have  the  same  determinant. 

Invertibility 

A is  invertible  if  and  only  if  P~^AP  is  invertible. 

Rank 

A and  P~^AP  have  the  same  rank. 

Nullity 

A and  P~^AP  have  the  same  nullity. 

Trace 

A and  P~^AP  have  the  same  trace. 

Characteristic 

polynomial 

A and  P~^AP  have  the  same  characteristic  polynomial. 

Eigenvalues 

A and  P~^AP  have  the  same  eigenvalues. 

Eigenspace 

dimension 

If  A is  an  eigenvalue  of  A and  hence  of  p ~^AP>  then  the  eigenspace  of  A 
corresponding  to  A and  the  eigenspace  of  P~^AP  corresponding  to  A have  the  same 
dimension. 

Expressed  in  the  language  of  similarity,  Problem  1 posed  above  is  equivalent  to  asking  whether  the  matrix  A is 
similar  to  a diagonal  matrix.  If  so,  the  diagonal  matrix  will  have  all  of  the  similarity-invariant  properties  of  A,  but 
will  have  a simpler  form,  making  it  easier  to  analyze  and  work  with.  This  important  idea  has  some  associated 
terminology. 


DEFINITION  2 

A square  matrix  A is  said  to  be  diagonalizable  if  it  is  similar  to  some  diagonal  matrix;  that  is,  if  there  exists 
an  invertible  matrix  P such  that  p ~^AP  is  diagonal.  In  this  case  the  matrix  P is  said  to  diagonalize  A. 


J 


The  following  theorem  shows  that  Problems  1 and  2 posed  above  are  actually  two  different  forms  of  the  same 
mathematical  problem. 


THEOREM  5.2.1 


If  A is  an  ^ x n matrix,  the  following  statements  are  equivalent. 


(a)  A is  diagonalizable. 

(b)  A has  n linearly  independent  eigenvectors. 


Part  ( b ) of  Theorem  5.2.1  is  equivalent  to  saying 
that  there  is  a basis  for  Rn  consisting  of 
eigenvectors  of  A.  Why? 


Proof  (a)=>(b)  Since  A is  assumed  to  be  diagonalizable,  it  follows  that  there  exists  an  invertible  matrix  P and  a 
diagonal  matrix  D such  that  P-1AP  = D or,  equivalently, 


AP  = PD  (1) 

If  we  denote  the  column  vectors  of  P by  p \ , p2,  - - pM,  and  if  we  assume  that  the  diagonal  entries  of  D are 
Ai,  A2, A then  by  Formula  6 of  Section  1.3  the  left  side  of  1 can  be  expressed  as 

AP  = A[ pi  p2  ...  p«]  = [Ap{  Am  — Ap„] 

and,  as  noted  in  the  comment  following  Example  1 of  Section  1.7,  the  right  side  of  1 can  be  expressed  as 

PD=  [Aipi  A2P2  ...  A„p„] 

Thus,  it  follows  from  1 that 


Ap\  = Aipi,  Ap2=A2P2,...,  Ap„  = Awp„  (2) 

Since  P is  invertible,  we  know  from  Theorem  5.1.6  that  its  column  vectors  pi,  P2,  - Vn  are  linearly  independent 
(and  hence  nonzero).  Thus,  it  follows  from  2 that  these  n column  vectors  are  eigenvectors  of  A. 

Assume  that  A has  n linearly  independent  eigenvectors,  Pi,  P2,  Pn>  and  that  Aj,  A2, A„  are 
the  corresponding  eigenvalues.  If  we  let 


^=[P1  P2  — Pm] 

and  if  we  let  D be  the  diagonal  matrix  that  has  X\,  A2, AM  as  its  successive  diagonal  entries,  then 

AP  = A[ pi  P2  ...  P„]  = [Api  Ap2  ...  Ap„] 

= [Aipi  A2P2  — A„p„]  =PD 

Since  the  column  vectors  of  P are  linearly  independent,  it  follows  from  Theorem  5.1.6  that  P is  invertible,  so  that 
this  last  equation  can  be  rewritten  as  P~^AP  = which  shows  that  A is  diagonalizable. 


Procedure  for  Diagonalizing  a Matrix 

The  preceding  theorem  guarantees  that  an  n x n matrix  A with  n linearly  independent  eigenvectors  is 
diagonalizable,  and  the  proof  suggests  the  following  method  for  diagonalizing  A. 


Procedure  for  Diagonalizing  a Matrix 


Step  1.  Confirm  that  the  matrix  is  actually  diagonalizable  by  finding  n linearly  independent  eigenvectors. 
One  way  to  do  this  is  by  finding  a basis  for  each  eigenspace  and  merging  these  basis  vectors  into  a single 
set  S.  If  this  set  has  fewer  than  n vectors,  then  the  matrix  is  not  diagonalizable. 

Step  2.  Form  the  matrix  P = [ p i P2  ---  Pm]  that  has  the  vectors  in  S as  its  column  vectors. 

Step  3.  The  matrix  P~^AP  will  he  diagonal  and  have  the  eigenvalues  X\,  A2,  A n corresponding  to  the 
eigenvectors  pj,  P2, pM  as  its  successive  diagonal  entries. 


J 


EXAMPLE  1 Finding  a Matrix  P That  Diagonalizes  a Matrix  A 


Find  a matrix  P that  diagonalizes 


0 -2 
2 1 
0 3 


In  Example  7 of  the  preceding  section  we  found  the  characteristic  equation  of  A to  be 

(A-  1)(A  — 2)2  = 0 


and  we  found  the  following  bases  for  the  eigenspaces: 


II 

PU 

CM 

II 

<< 

"-1" 

0 

, P2  = 

0" 

1 

; A=  1:  p3  = 

-2" 

1 

1 

0 

1 

There  are  three  basis  vectors  in  total,  so  the  matrix 

-1 
0 
1 


P = 


diagonalizes  A.  Asa  check,  you  should  verify  that 


P~lAP  = 


-2 

1 

1 


1 

o 

ro 

0 0-2' 

'-1  0 -2" 

'2  0 0" 

i i i 

1 2 1 

0 1 1 

= 

0 2 0 

-1  0 -1 

21  0 3 

1 0 1 

1 

o 

o 

In  general,  there  is  no  preferred  order  for  the  columns  of  P.  Since  the  z'th  diagonal  entry  of  p 1 AP  is  an  eigenvalue 
for  the  /tli  column  vector  of  P,  changing  the  order  of  the  columns  of  P just  changes  the  order  of  the  eigenvalues  on 
the  diagonal  of  P~^AP-  Thus,  had  we  written 

r-i  -2  o" 

0 1 1 
1 1 0 


p = 


the  preceding  example,  we  would  have  obtained 


P~lAP  = 


2 0 0 
0 1 0 
0 0 2 


EXAMPLE  2 A Matrix  That  Is  Not  Diagonalizable 

Find  a matrix  P that  diagonalizes 

A = 


1 

1 

-3 


0 0 
2 0 
5 2 


The  characteristic  polynomial  of  A is 

A—  1 0 

det(A/  — A)  = 


-1  A — 2 0 

3 -5  A — 2 


= (A-1)(A  — 2)‘ 


so  the  characteristic  equation  is 

(A  — 1 ) (A  — 2)2  = 0 

Thus,  the  distinct  eigenvalues  of  A are  = 1 and  A = 2-  We  leave  it  for  you  to  show  that  bases  for 
the  eigenspaces  are 

1 

8 

1 


A=l:  pi  = 


; A = 2:  p2  = 


Since  A is  a 3 x 3 matrix  and  there  are  only  two  basis  vectors  in  total,  A is  not  diagonalizable. 

If  you  are  concerned  only  in  determining  whether  a matrix  is 
diagonalizable  and  not  with  actually  finding  a diagonalizing  matrix  P,  then  it  is  not  necessary  to 
compute  bases  for  the  eigenspaces — it  suffices  to  find  the  dimensions  of  the  eigenspaces.  For  this 
example,  the  eigenspace  corresponding  to  \ = ] is  the  solution  space  of  the  system 


0 

0 

0" 

"*f 

0" 

-1 

-1 

0 

*2 

= 

0 

3 

-5 

-1 

*3 

0 

Since  the  coefficient  matrix  has  rank  2 (verify),  the  nullity  of  this  matrix  is  1 by  Theorem  4.8.2,  and 
hence  the  eigenspace  corresponding  to  ,\  = ] is  one-dimensional. 

The  eigenspace  corresponding  to  \ = 2 is  the  solution  space  of  the  system 


1 

o 

o 

‘*f 

'O' 

-1  0 0 

*2 

= 

0 

3-5  0 

*3 

0 

This  coefficient  matrix  also  has  rank  2 and  nullity  1 (verify),  so  the  eigenspace  corresponding  to 
\ = 2 is  also  one-dimensional.  Since  the  eigenspaces  produce  a total  of  two  basis  vectors,  and  since 
three  are  needed,  the  matrix  A is  not  diagonalizable. 


There  is  an  assumption  in  Example  1 that  the  column  vectors  of  P,  which  are  made  up  of  basis  vectors  from  the 
various  eigenspaces  of  A,  are  linearly  independent.  The  following  theorem,  proved  at  the  end  of  this  section,  shows 
that  this  is  so. 


THEOREM  5.2.2 

If  vi , V2, Vfc  are  eigenvectors  of  a matrix  A corresponding  to  distinct  eigenvalues,  then 
{ vf  v2>  - - ->  } is  a linearly  independent  set. 


Theorem  5.2.2  is  a special  case  of  a more  general  result:  Suppose  that  Aj,  A2, Aft  are  distinct 
eigenvalues  and  that  we  choose  a linearly  independent  set  in  each  of  the  corresponding  eigenspaces.  If  we  then 
merge  all  these  vectors  into  a single  set,  the  result  will  still  be  a linearly  independent  set.  For  example,  if  we  choose 
three  linearly  independent  vectors  from  one  eigenspace  and  two  linearly  independent  vectors  from  another 
eigenspace,  then  the  five  vectors  together  form  a linearly  independent  set.  We  omit  the  proof. 


As  a consequence  of  Theorem  5.2.2,  we  obtain  the  following  important  result. 


THEOREM  5.2.3 

If  an  n x n matrix  A has  n distinct  eigenvalues,  then  A is  diagonalizable. 


If  vi,  V2, v„  are  eigenvectors  corresponding  to  the  distinct  eigenvalues  Ai,  A2, AM,  then  by  Theorem 
5.2.2,  vi,  V2, vw  are  linearly  independent.  Thus,  A is  diagonalizable  by  Theorem  5.2.1. 


EXAMPLE  3 Using  Theorem  5.2.3 


We  saw  in  Example  3 of  the  preceding  section  that 


A = 


0 1 
0 0 
4 -17 


0 

1 

8 


has  three  distinct  eigenvalues:  \ = 4,  A = 2 4*  ^3,  and  A = 2 — Therefore,  A is  diagonalizable 
and 


P~lAP  = 


4 

0 

0 


0 

2 + ^3 
0 


for  some  invertible  matrix  P.  If  needed,  the  matrix  P can  be  found  using  the  method  shown  in 
Example  1 of  this  section. 


EXAMPLE  4 Diagonalizability  of  Triangular  Matrices 


From  Theorem  5.1.2,  the  eigenvalues  of  a triangular  matrix  are  the  entries  on  its  main  diagonal. 
Thus,  a triangular  matrix  with  distinct  entries  on  the  main  diagonal  is  diagonalizable.  For  example, 


A = 


-12  4 0 

0 3 1 7 

0 0 5 8 

0 0 0 -2 


is  a diagonalizable  matrix  with  eigenvalues  Aj  = — 1,  A2  = 3,  A3  = 5,  A4  = — 2. 


Computing  Powers  of  a Matrix 

There  are  many  applications  in  which  it  is  necessary  to  compute  high  powers  of  a square  matrix  A.  We  will  show 
next  that  if  A happens  to  be  diagonalizable,  then  the  computations  can  be  simplified  by  diagonalizing  A. 


To  start,  suppose  that  A is  a diagonalizable  ^ x n matrix,  that  P diagonalizes  A,  and  that 

"Ai  0 ...  0 

P~lAP  = 


0 A2  - 0 
0 0 ...  A„ 


= D 


Squaring  both  sides  of  this  equation  yields 


(p~lAP2)j  = 


X2  0 
0 X2 

0 0 


= Di 


We  can  rewrite  the  left  side  of  this  equation  as 

2 

_14P)  =P  ~XAPP  ~{AP  = P ~{AIAP  = P~{  A2  P 

from  which  we  obtain  the  relationship  P~^A^P  = D^-  More  generally,  if  k is  a positive  integer,  then  a similar 
computation  will  show  that 


P~XAkP  = Dk 


Af  0 

0 4 


0 

0 


0 0 ...  A* 


which  we  can  rewrite  as 


Ak  = PDkP~1  =P 


Af  0 

0 Aj* 


0 0 


0 

0 


4 


>-l 


(3) 


Formula  3 reveals  that  raising  a diagonalizable 
matrix  A to  a positive  integer  power  has  the  effect 
of  raising  its  eigenvalues  to  that  power. 


Note  that  computing  the  right  side  of  this  formula  involves  only  three  matrix  multiplications  and  the  powers  of  the 
diagonal  entries  of  D.  For  matrices  of  large  size  and  high  powers  of  \,  this  involves  substantially  fewer  operations 
than  computing  Ak  directly. 

EXAMPLE  5 Power  of  a Matrix 


Use  3 to  find  A13,  where 


A = 


0 0 -2 
1 2 1 
1 0 3 


We  showed  in  Example  1 that  the  matrix  A is  diagonalized  by 

-1  0 -2" 


and  that 


P = 


0 

1 


D = P~lAP  = 


Thus,  it  follows  from  3 that 


a13=pd13p~1 


1 

1 

0 

2 

0 


'-1  0 

-2' 

213  0 

0 

1 0 

2' 

= 

0 1 

1 

0 213 

0 

1 1 

1 

1 0 

1 

1 

O 

o 

l13. 

-1  0 

-1 

'-8190 

0 

-16382' 

= 

8191 

8192  8191 

8191 

0 

16383 

(4) 


With  the  method  in  the  preceding  example,  most  of  the  work  is  in  diagonalizing  A.  Once  that  work  is 
done,  it  can  be  used  to  compute  any  power  of  A.  Thus,  to  compute  A ^00  we  need  only  change  the  exponents  from 
13  to  1000  in  4. 


Eigenvalues  of  Powers  of  a Matrix 


Once  the  eigenvalues  and  eigenvectors  of  any  square  matrix  A are  found,  it  is  a simple  matter  to  find  the 
eigenvalues  and  eigenvectors  of  any  positive  integer  power  of  A.  For  example,  if  A is  an  eigenvalue  of  A and  x is  a 
corresponding  eigenvector,  then 

A2x  = A (Ax)  = A (Ax)  = A(Ax)  = A (Ax)  = A2x 

which  shows  not  only  that  A2  is  an  eigenvalue  of  A1  but  that  x is  a corresponding  eigenvector.  In  general,  we  have 
the  following  result. 

Note  that  diagonalizability  is  not  a requirement  in 
Theorem  5.2.4. 


THEOREM  5.2.4 

If  A is  an  eigenvalue  of  a square  matrix  A and  x is  a corresponding  eigenvector,  and  if  k is  any  positive 
integer,  then  A*  is  an  eigenvalue  of  Ak  and  x is  a corresponding  eigenvector. 


Some  problems  that  use  this  theorem  are  given  in  the  exercises. 


Geometric  and  Algebraic  Multiplicity 

Theorem  5.2.3  does  not  completely  settle  the  diagonalizability  question  since  it  only  guarantees  that  a square 
matrix  with  n distinct  eigenvalues  is  diagonalizable,  but  does  not  preclude  the  possibility  that  there  may  exist 
diagonalizable  matrices  with  fewer  than  n distinct  eigenvalues.  The  following  example  shows  that  this  is  indeed  the 
case. 


EXAMPLE  6 The  Converse  of  Theorem  5.2.3  Is  False 

Consider  the  matrices 


1 

o 

o 

1 

o 

/= 

0 1 0 

and  J = 

0 1 1 

o 

o 

1 

o 

o 

It  follows  from  Theorem  5.1.2  that  both  of  these  matrices  have  only  one  distinct  eigenvalue,  namely 
A = 1 , and  hence  only  one  eigenspace.  We  leave  it  as  an  exercise  for  you  to  solve  the  characteristic 
equations 

(A/-/)x  = 0 and  (\J-l)x  = 0 

with  A = 1 and  show  that  for  / the  eigenspace  is  three-dimensional  (all  of  and  for  Jit  is 
one-dimensional,  consisting  of  all  scalar  multiples  of 


X = 


1 
0 
0 

This  shows  that  the  converse  of  Theorem  5.2.3  is  false,  since  we  have  produced  two  3x3  matrices 
with  fewer  than  three  distinct  eigenvalues,  one  of  which  is  diagonalizable  and  the  other  of  which  is 
not. 


A full  excursion  into  the  study  of  diagonalizability  is  left  for  more  advanced  courses,  but  we  will  touch  on  one 
theorem  that  is  important  to  a fuller  understanding  of  diagonalizability.  It  can  be  proved  that  if  Aq  is  an  eigenvalue 
of  A,  then  the  dimension  of  the  eigenspace  corresponding  to  Aq  cannot  exceed  the  number  of  times  that  A — Aq 
appears  as  a factor  of  the  characteristic  polynomial  of  A.  For  example,  in  Example  1 and  Example  2 the 
characteristic  polynomial  is 

(A- l)(A-2)2 

Thus,  the  eigenspace  corresponding  to  A = 1 is  at  most  (hence  exactly)  one-dimensional,  and  the  eigenspace 
corresponding  to  A = 2 is  at  most  two-dimensional.  In  Example  1 the  eigenspace  corresponding  to  A = 2 actually 
had  dimension  2,  resulting  in  diagonalizability,  but  in  Example  2 the  eigenspace  corresponding  to  A = 2 had  only 
dimension  1,  resulting  in  nondiagonalizability. 

There  is  some  terminology  that  is  related  to  these  ideas.  If  Aq  is  an  eigenvalue  of  an  ^ x n matrix  A,  then  the 
dimension  of  the  eigenspace  corresponding  to  Aq  is  called  the  geometric  multiplicity  of  Aq,  and  the  number  of 
times  that  A — Aq  appears  as  a factor  in  the  characteristic  polynomial  of  A is  called  the  algebraic  multiplicity  of  Aq. 
The  following  theorem,  which  we  state  without  proof,  summarizes  the  preceding  discussion. 


Geometric  and  Algebraic  Multiplicity 

If  A is  a square  matrix,  then: 

(a)  For  every  eigenvalue  of  A,  the  geometric  multiplicity  is  less  than  or  equal  to  the  algebraic  multiplicity. 

(b)  A is  diagonalizable  if  and  only  if  the  geometric  multiplicity  of  every  eigenvalue  is  equal  to  the 
algebraic  multiplicity. 


OPTIONAL 

We  will  complete  this  section  with  an  optional  proof  of  Theorem  5.2.2. 

Let  vi , V2, Vft  be  eigenvectors  of ,4  corresponding  to  distinct  eigenvalues 
Ai,  A2, Aft.  We  will  assume  that  vi,  V2, Vft  are  linearly  dependent  and  obtain  a contradiction.  We  can  then 
conclude  that  vi,  V2, Vft  are  linearly  independent. 

Since  an  eigenvector  is  nonzero  by  definition,  { vi } is  linearly  independent.  Let  r be  the  largest  integer  such  that 
(vi,  V2,  vr)  is  linearly  independent.  Since  we  are  assuming  that  {v\,  V2, Vft}  is  linearly  dependent,  r 
satisfies  1 < r < k . Moreover,  by  the  definition  of  r,  (vi,  V2, vr_|_i ) is  linearly  dependent.  Thus,  there  are 
scalars  ci,  c 2, cr. |_i,  not  all  zero,  such  that 


cjvi  + C2V2  + ...  + cr+ivr+i  = 0 


(5) 


Multiplying  both  sides  of  5 by  A and  using  the  fact  that 

Ax\  = Ajvi,  Av2  = A2V2,...,  Avr+i  =A,.+ivr+i 

we  obtain 


ciAivi  * c2A2v2  + ...  + c,+iA,+iv,.+i  = 0 

If  we  now  multiply  both  sides  of  5 by  and  subtract  the  resulting  equation  from  6 we  obtain 
ci(Ai  — Ar+i)vj  +c2(A2  — Ar+i)v2+...+  c).(Ar  — Ar+i)v,  = 0 
Since  {v\,  V2, vr}  is  a linearly  independent  set,  this  equation  implies  that 

ci(Ai  — A,.+i)  =C2(A2-Am_i)  = ...  = Cr(A,-Ar+i)  = 0 
and  since  Aj,  A2, Ar  , \ are  assumed  to  be  distinct,  it  follows  that 

c\  =C2  = ...  = cr  = 0 


Substituting  these  values  in  5 yields 


Cr+iv,-+l 

Since  the  eigenvector  is  nonzero,  it  follows  that 


0 


cr+l  = 0 


But  equations  7 and  8 contradict  the  fact  that  c\,  c%  cr+\  are  not  all  zero  so  the  proof  is  complete. 


(6) 


(7) 


(8) 


Concept  Review 

Similarity  transformation 
Similarity  invariant 
Similar  matrices 
Diagonalizable  matrix 
Geometric  multiplicity 
Algebraic  multiplicity 

Skills 

Determine  whether  a square  matrix  A is  diagonalizable. 

Diagonalize  a square  matrix  ,4. 

Find  powers  of  a matrix  using  similarity. 

Find  the  geometric  multiplicity  and  the  algebraic  multiplicity  of  an  eigenvalue. 


Exercise  Set  5.2 


In  Exercises  1—4,  show  that  A and  B are  not  similar  matrices. 

1. 


’A  = 


1 1 

3 2 


,B  = 


1 0 

3 -2 


Answer: 

Possible  reason:  Determinants  are  different. 


2. 


A = 


4 -1 
2 4 


B = 


4 1 
2 4 


3. 

'1  2 3' 

12  0 
1,0 

A = 

0 1 2 
0 0 1 

,B  = 

O 

o 

Answer: 

Possible  reason:  Ranks  are  different. 


4. 

'1  0 f 

'1  1 0" 

A = 

2 0 2 
3 0 3 

,B  = 

2 2 0 
0 1 1 

9 o 

5.  Let  A be  a $ x 6 matrix  with  characteristic  equation  A (A  — 1)  (A  — 2)  = 0.  What  are  the  possible  dimensions 
for  eigenspaces  of  A? 

Answer: 

A = 0: 1 or  2;  A=  1: 1;  A = 2: 1,  2,  or  3 

6.  Let 


A = 


4 0 1 
2 3 2 
1 0 4 


(a)  Find  the  eigenvalues  of  A. 

(b)  For  each  eigenvalue  A,  find  the  rank  of  the  matrix  A/  — A- 

(c)  Is  A diagonalizable?  Justify  your  conclusion. 

In  Exercises  7-11,  use  the  method  of  Exercise  6 to  determine  whether  the  matrix  is  diagonalizable. 

7.  2 0 
1 2 


Answer: 


Not  diagonalizable 


8. 


2 

1 


-3 

-1 


0 2 0 
0 1 2 


Answer: 


Not  diagonalizable 


10. 


-1  0 1 

-13  0 

-4  13  -1 


11. 


2-10  1 
0 2 1-1 

0 0 3 2 

0 0 0 3 


Answer: 


Not  diagonalizable 

In  Exercises  12-15,  find  a matrix  P that  diagonalizes  A,  and  compute  P~^AP- 


12. 


A = 


13. 


A = 


-14  12 
-20  17 

1 O' 

6 -1 


Answer: 


P = 


14. 


A = 


1 1 


■ P~lAP  = 

1 0 
.o  -i_ 

1 0 0 
0 1 1 
0 1 1 


15.  2 0 -2 

A=  0 3 0 

0 0 3 


Answer: 


—2 

0 

f 

II 

$! 

7 

cl, 

'3 

0 

0" 

0 

1 

0 

0 

3 

0 

1 

0 

0 

0 

0 

2 

In  Exercises  16-21,  find  the  geometric  and  algebraic  multiplicity  of  each  eigenvalue  of  the  matrix  A,  and 
determine  whether  A is  diagonalizable.  If  A is  diagonalizable,  then  find  a matrix  P that  diagonalizes  A,  and  find 
P~lAP- 


16.  [19  -9  -6' 

A=  25  -11  -9 
17  -9  -4 

-1  4 -2' 

-3  4 0 

-3  1 3_ 

Answer: 

"1  21]  [10  O' 

P=  1 3 3;  P~lAP  = 020 
1 3 4j  |_0  0 3 

18.  [5  0 0" 

A=  15  0 

0 1 5 

19.  [0  0 O' 

A=  0 0 0 

3 0 1 

Answer: 

10  0]  [0  0 O' 

0 10;  P~lAP=  000 
3 0 lj  [0  0 1 

2 0 0 O' 

0-200 
0 0 3 0 

0 0 13 

2 0 0 0" 

0-2  5-5 
0 0 3 0 

0 0 0 3 

Answer: 


"1  0 0 0]  [-2  0 0 0" 

0 11-1.  p-lAp=  0-200 
0 0 10’  0 0 3 0 

0 0 0 lj  |_0003 

22.  Use  the  method  of  Example  5 to  compute  where 


23.  Use  the  method  of  Example  5 to  compute  A 1 1 , where 


Answer: 


A = 


-1  7 -1 

0 1 0 
0 15  -2 


-1  10237  -2047 
0 1 0 
0 10245  -2048 

24.  In  each  part,  compute  the  stated  power  of 


A = 


1 -2  8 
0-1  0 
0 0-1 


(a)  A 

25.  Find  An  if  n is  a positive  integer  and 


1000 


(b)  ^_100°  (c)  A2m 


A = 


3 -1  0 

-1  2 -1 
0-1  3 


Answer: 


An  = pDYip-\  = 


1 

-1 

1 


r 

o 

o 


o 

3” 

0 


0 

0 

4” 


1 

3 

4 0-4 

I 

'3 


26.  Let 


a b 
c d 


Show  that 


(d)  A 


-2301 


(a)  A is  diagonalizable  if  (a  — d)  + 4 be  > 0. 

(b)  A is  not  diagonalizable  if  (a  — d)^  + Abe  < 0. 

[Hint:  See  Exercise  19  of  Section  5.1.] 

27.  In  the  case  where  the  matrix  A in  Exercise  26  is  diagonalizable,  find  a matrix  P that  diagonalizes  A.  [Hint:  See 
Exercise  20  of  Section  5.1.] 

Answer: 

-b 


On  possibility  is  P = 


-6 
Ai  a-  \2 


where  X\  and  A2  are  as  in  Exercise  20  of  Section  5.1. 


28.  Prove  that  similar  matrices  have  the  same  rank. 

29.  Prove  that  similar  matrices  have  the  same  nullity. 


30.  Prove  that  similar  matrices  have  the  same  trace. 

31.  Prove  that  if  A is  diagonalizable,  then  so  is  A*  for  every  positive  integer  k. 

32.  Prove  that  if  A is  a diagonalizable  matrix,  then  the  rank  of  A is  the  number  of  nonzero  eigenvalues  of  A. 

33.  Suppose  that  the  characteristic  polynomial  of  some  matrix  A is  found  to  be  p( A)  = (A  — 1)  (A  — 3)  (A  — 4)  . 
In  each  part,  answer  the  question  and  explain  your  reasoning. 

(a)  What  can  you  say  about  the  dimensions  of  the  eigenspaces  of  A? 

(b)  What  can  you  say  about  the  dimensions  of  the  eigenspaces  if  you  know  that  A is  diagonalizable? 

(c)  If  { vi , V2,  V3  } is  a linearly  independent  set  of  eigenvectors  of  A all  of  which  correspond  to  the  same 
eigenvalue  of  A , what  can  you  say  about  the  eigenvalue? 


Answer: 


(a) 

(b) 

(c) 


A = 1 : dimension  =1,  A = 3 : dimension  <2,  A = 4 : dimension  <3 

Dimensions  will  be  exactly  1,  2,  and  3. 

A = 4 


34.  This  problem  will  lead  you  through  a proof  of  the  fact  that  the  algebraic  multiplicity  of  an  eigenvalue  of  an 
nxn  matrix  A is  greater  than  or  equal  to  the  geometric  multiplicity.  For  this  purpose,  assume  that  Aq  is  an 
eigenvalue  with  geometric  multiplicity  k. 

(a)  Prove  that  there  is  a basis  B—  {uj , U2, . . uM  } for  Rn  in  which  the  first  k vectors  of  B form  a basis  for  the 
eigenspace  corresponding  to  Aq. 


(b)  Let  P be  the  matrix  having  the  vectors  in  B as  columns.  Prove  that  the  product  AP  can  be  expressed  as 


AP  = P 


'Vfc 

o 


x 

Y 


[Hint:  Compare  the  first  k column  vectors  on  both  sides.] 


(c)  Use  the  result  in  part  (b)  to  prove  that  A is  similar  to 


C = 


'Vfc 

o 


X 

Y 


and  hence  that  A and  C have  the  same  characteristic  polynomial. 


(d)  By  considering  det(A I — C),  prove  that  the  characteristic  polynomial  of  C (and  hence  A)  contains  the 

factor  (A  — Aq)  at  least  k times,  thereby  proving  that  the  algebraic  multiplicity  of  Aq  is  greater  than  or  equal 
to  the  geometric  multiplicity  k. 


True-False  Exercises 


In  parts  (a)-(h)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  Every  square  matrix  is  similar  to  itself. 

Answer: 

True 

(b)  If  A,  B,  and  C are  matrices  for  which  A is  similar  to  B and  B is  similar  to  C,  then  A is  similar  to  C. 


Answer: 


True 

(c)  If  A and  B are  similar  invertible  matrices,  then  ^4  -1  and  g are  similar. 

Answer: 

True 

(d)  If  A is  diagonalizable,  then  there  is  a unique  matrix  P such  that  p ~^AP  is  diagonal. 

Answer: 

False 

(e)  If ,4  is  diagonalizable  and  invertible,  then  A~^  is  diagonalizable. 

Answer: 

True 

(f)  If  A is  diagonalizable,  then  jp  is  diagonalizable. 

Answer: 

True 

(g)  If  there  is  a basis  for  Rn  consisting  of  eigenvectors  of  an  ^ x n matrix  A,  then  A is  diagonalizable. 
Answer: 

True 

(h)  If  every  eigenvalue  of  a matrix  ,4  has  algebraic  multiplicity  1,  then  ,4  is  diagonalizable. 

Answer: 

True 
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5.3  Complex  Vector  Spaces 

Because  the  characteristic  equation  of  any  square  matrix  can  have  complex  solutions,  the  notions  of  complex  eigenvalues  and 
eigenvectors  arise  naturally,  even  within  the  context  of  matrices  with  real  entries.  In  this  section  we  will  discuss  this  idea  and 
apply  our  results  to  study  symmetric  matrices  in  more  detail.  A review  of  the  essentials  of  complex  numbers  appears  in  the 
back  of  this  text. 


Review  of  Complex  Numbers 


Recall  that  if  z = a + bi  is  a complex  number,  then: 

Re(z)  = a and  Im(z)  = b are  called  the  real  part  of  z and  the  imaginary  part  of  z,  respectively, 

|z|  = /a2  4-  6 2 is  called  the  modulus  (or  absolute  value)  of z, 

z = a — bi  is  called  the  complex  conjugate  of  z, 

• z z = a2  =hb2  = |z|2, 

the  angle  $ in  Figure  5.3.1  is  called  an  argument  of  z, 

• Re(z)  = |z|  cos 

• Im(z)  = |z|  sin  $ 

z=  |z|(cos$+isin<$)  is  called  the  polar  form  of  z. 


Complex  Eigenvalues 


In  Formula  3 of  Section  5. 1 we  observed  that  the  characteristic  equation  of  a general  nxn  matrix  A has  the  form 


(1) 


in  which  the  highest  power  of  A has  a coefficient  of  1.  Up  to  now  we  have  limited  our  discussion  to  matrices  in  which  the 
solutions  of  1 are  real  numbers.  However,  it  is  possible  for  the  characteristic  equation  of  a matrix^  with  real  entries  to  have 
imaginary  solutions;  for  example,  the  characteristic  equation  of  the  matrix 


is 

= A2  + 1 = 0 

which  has  the  imaginary  solutions  \ = j and  \=  — j.  To  deal  with  this  case  we  will  need  to  explore  the  notion  of  a complex 
vector  space  and  some  related  ideas. 


A + 2 1 

-5  A — 2 


Vectors  in  Cn 

A vector  space  in  which  scalars  are  allowed  to  be  complex  numbers  is  called  a complex  vector  space.  In  this  section  we  will 
be  concerned  only  with  the  following  complex  generalization  of  the  real  vector  space  Rn. 


DEFINITION  1 

If  n is  a positive  integer,  then  a complex  n-tuple  is  a sequence  of  n complex  numbers  (vj,  V2,  vM).  The  set  of  all 
complex  ^-tuples  is  called  complex  n-space  and  is  denoted  by  C”.  Scalars  are  complex  numbers,  and  the  operations 
of  addition,  subtraction,  and  scalar  multiplication  are  performed  componentwise. 


J 


The  terminology  used  for  /i-tuples  of  real  numbers  applies  to  complex  n-tuples  without  change.  Thus,  if  v\,  V2, vM  are 
complex  numbers,  then  we  call  v = (y\,  V2, vM)  a vector  in  Cn  and  vj,  V2, vn  its  components.  Some  examples  of 
vectors  in  C3  are 

u=  (1  +i,  — 4z,  3 4-  20,  v=  (0,  i,  5),  w=  ^6  - fei,  9 4=  ^i,  raj 

Every  vector 

v=  (vi,  V2,...,  v„)  = (a\  + b\i,  a2^b2i,...,ayil~byli) 
in  C”  can  be  split  into  real  and  imaginary  parts  as 

v=(aua2,...,ari)  + i(Ai,  b2, b„) 

which  we  also  denote  as 

v = Re(v)  Im(v) 

where 

Re(v)  = (a\,  a 2 , an)  and  Im(v)  = (Ai,  £2*  bn) 

The  vector 

V=  (V1.V2,...,  v„)  = («1  -b\i,a2-b2i,...,an-bni) 
is  called  the  complex  conjugate  of  v and  can  be  expressed  in  terms  of  Re(v)  and  Im(v)  as 


v=  (a\,  <32,  CLyi)  — i(Ai,  A2, A„)  =Re(v)  —i  Im(v) 


(2) 


It  follows  that  the  vectors  in  Rn  can  be  viewed  as  those  vectors  in  C”  whose  imaginary  part  is  zero;  or  stated  another  way,  a 
vector  v in  C”  is  in  Rn  if  and  only  if  v = v. 

In  this  section  we  will  also  need  to  consider  matrices  with  complex  entries,  so  henceforth  we  will  call  a matrix  A a real  matrix 
if  its  entries  are  required  to  be  real  numbers  and  a complex  matrix  if  its  entries  are  allowed  to  be  complex  numbers.  The 
standard  operations  on  real  matrices  carry  over  to  complex  matrices  without  change,  and  all  of  the  familiar  properties  of 
matrices  continue  to  hold. 

If  A is  a complex  matrix,  then  Re(4)  and  Im(4)  are  the  matrices  formed  from  the  real  and  imaginary  parts  of  the  entries  of  A , 
and  A is  the  matrix  formed  by  taking  the  complex  conjugate  of  each  entry  in  A. 

EXAMPLE  1 Real  and  Imaginary  Parts  of  Vectors  and  Matrices 

Let 

1 d-i  —1 
4 6-2  i 


v = (3  + i,  — 2 i,  5)  and  A = 


Then 


v = (3  - i,  2 i,  5),  Re(v)  = (3,  0,  5),  Im(v)  = (1,  - 2,  0) 


A = 


det  (A)  = 


1 — i i 

4 6 + 2i 

1+i  -i 
4 6 — 2 i 


Re  (A)  = 


1 0 

4 6 


Im(.i4)  = 


1 -1 

0 -2 


= (l+,)(6  — 2i)-(-j)(4)  = 8 + 8i 


Algebraic  Properties  of  the  Complex  Conjugate 

The  next  two  theorems  list  some  properties  of  complex  vectors  and  matrices  that  we  will  need  in  this  section.  Some  of  the 
proofs  are  given  as  exercises. 


THEOREM  5.3.1 

If  u and  v are  vectors  in  C”,  and  if  k is  a scalar,  then: 

(a)  5 = u 

(b)  ku  = fcu. 

(c)  u + v = u-f  v 

(d)  u^v  = n-v 


THEOREM  5.3.2 

If  + is  an  m x k complex  matrix  and  B is  a £ x n complex  matrix,  then: 

(a)  A = A 

(b)  ^rj=  (A)7 

(c)  AB  = A B 


The  Complex  Euclidean  Inner  Product 

The  following  definition  extends  the  notions  of  dot  product  and  norm  to  CM. 


DEFINITION  2 

If  u = (ti\,  un)  and  v = (vj,  V2,  - v„)  are  vectors  in  C”,  then  the  complex  Euclidean  inner  product  of  of  u 

and  v (also  called  the  complex  dot  product ) is  denoted  by  u • v and  is  defined  as 


U • v = «1V1  + U2v2  + •••  + unvn 


(3) 


We  also  define  the  Euclidean  norm  on  C”  to  be 

||v||  = j/v-  v=  /|vi|2+  |v2|2+-+  |vm|2  (4) 

L J 

As  in  the  real  case,  we  call  v a unit  vector  in  Cn  if  ||v||  = 1,  and  we  say  two  vectors  u and  v are  orthogonal  if  u ■ v = 0- 

The  complex  conjugates  in  3 ensure  that  ||v||  is  a real 
number,  for  without  them  the  quantity  v • v in  4 might 
be  imaginary. 


EXAMPLE  2 Complex  Euclidean  Inner  Product  and  Norm 

Find  u * v?  v * u?  ||u||,  and  ||v||  for  the  vectors 

u=  (1  -Hi,  i,  3 — i)  and  v=(l+i,  2,  Ai) 


Solution 

u-v=(l+0(T+T)  + i(2)+(3-0(4i)  = (l+0(l-0  + 2i  + (3-0(-4i)=  — 2 — lCtt 
v • u=(l  +0(1+7)  + 2(7)  + (4i) (3=7)  = (1  +i)(l  _i)-2i  + 4i(3  + 0=  — 2 10? 

l|u||  = j/|l  +i|2  + |i|2  + |3  — i|2  = ^2  + 1 + 10  = 
l|v||  = |/ 11  + j|2  + |2|2  + |4i|2  = |/2  + 4+16  = /22 


Recall  from  Table  1 of  Section  3.2  that  if  u and  v are  column  vectors  in  Rn,  then  their  dot  product  can  be  expressed  as 
The  analogous  formulas  in  C”  are  (verify) 


T T 

U • V = U V = V u 


T-  -T 

u ■ V = u v = v u 


(5) 


Example  2 reveals  a major  difference  between  the  dot  product  on  Rn  and  the  complex  dot  product  on  C”.  For  the  dot  product 
on  Rn  we  always  have  v ■ u = u • v (the  symmetry  property ),  but  for  the  complex  dot  product  the  corresponding  relationship  is 
given  by  u • v = v • u,  which  is  called  its  antisymmetry  property.  The  following  theorem  is  an  analog  of  Theorem  3.2.2. 


THEOREM  5.3.3 

If  u,  v,  and  w are  vectors  in  C”,  and  if  k is  a scalar,  then  the  complex  Euclidean  inner  product  has  the  following 
properties: 

faj  u • v = v • u [Antisymmetry  property] 

(b)  u,(v+w)=u,v  + U‘W  [Distributive  property] 

(c)  t(u  ■ v)  = (hi)  • v [Homogeneity  property] 


(d)  u ■ kv  = k(u  • v) 


[Antihomogeneity  property] 
v = 0 [ Po  sitivity  prop  erty  ] 


(e)  v • v > 0 and  v • v = 0 if  and  only  if  v = 0 


Parts  (c)  and  ( d)  of  this  theorem  state  that  a scalar  multiplying  a complex  Euclidean  inner  product  can  be  regrouped  with  the 
first  vector,  but  to  regroup  it  with  the  second  vector  you  must  first  take  its  complex  conjugate.  We  will  prove  part  ( d ),  and 
leave  the  others  as  exercises. 


Proof  (d) 


£(u  • v)  = k(v  • uj  = £(v  • u)  = k(v  • u)  = [kv'j  • u = u ■ 
To  complete  the  proof,  substitute  k for  k and  use  the  fact  that  k = k- 


Vector  Concepts  in  Cn 


Except  for  the  use  of  complex  scalars,  the  notions  of  linear  combination,  linear  independence,  subspace,  spanning,  basis,  and 
dimension  carry  over  without  change  to  CM. 

Is/?”  a subspace  of C”?  Explain. 

Eigenvalues  and  eigenvectors  are  defined  for  complex  matrices  exactly  as  for  real  matrices.  If  A is  an  n x n matrix  with 
complex  entries,  then  the  complex  roots  of  the  characteristic  equation  det(A l — ^4)  = 0 are  called  complex  eigenvalues  of  A. 
As  in  the  real  case,  A is  a complex  eigenvalue  of  A if  and  only  if  there  exists  a nonzero  vector  x in  Cn  such  that  Ax  = Ax- 
Each  such  x is  called  a complex  eigenvector  of  A corresponding  to  X.  The  complex  eigenvectors  of  A corresponding  to  X are 
the  nonzero  solutions  of  the  linear  system  (A 1 — ^4)x  = 0,  and  the  set  of  all  such  solutions  is  a subspace  of  C”,  called  the 
eigenspace  of  A corresponding  to  X. 

The  following  theorem  states  that  if  a real  matrix  has  complex  eigenvalues,  then  those  eigenvalues  and  their  corresponding 
eigenvectors  occur  in  conjugate  pairs. 

THEOREM  5.3.4 

If  X is  an  eigenvalue  of  a real  nxn  matrix  A,  and  if  x is  a corresponding  eigenvector,  then  A is  also  an  eigenvalue  of  A, 
and  x is  a corresponding  eigenvector. 

Since  X is  an  eigenvalue  of  A and  x is  a corresponding  eigenvector,  we  have 


Ax  = Ax  = Ax 


(6) 


However,  A = A,  since  A has  real  entries,  so  it  follows  from  part  (c)  of  Theorem  5.3.2  that 


Ax  = Ax  = Ax 


(7) 


Equations  6 and  7 together  imply  that 


Ajc  = Ax  = Ax 


which  x ^ 0 (why?);  this  tells  us  that  A is  an  eigenvalue  of  A and  x is  a corresponding  eigenvector. 


EXAMPLE  3 Complex  Eigenvalues  and  Eigenvectors 

Find  the  eigenvalues  and  bases  for  the  eigenspaces  of 


A = 


-2  -1 

5 2 


The  characteristic  polynomial  of  A is 
A+2  1 

-5  A-2 


= Ai  + l = (A-z)(A  + z) 


so  the  eigenvalues  of  A are  A = i and  \ = — j.  Note  that  these  eigenvalues  are  complex  conjugates,  as 
guaranteed  by  Theorem  5.3.4. 


■*l“ 

"o' 

/2_ 

_o_ 

To  find  the  eigenvectors  we  must  solve  the  system 

'A  + 2 1 

-5  A-2 

with  \ = j and  then  with  \ = _ j.  With  A = i,  this  system  becomes 

i + 2 1 

_ -5  i-2 

We  could  solve  this  system  by  reducing  the  augmented  matrix 

i + 2 1 0 

-5  z- 20 


"*r 

"o' 

/2_ 

_o_ 

(8) 


(9) 


to  reduced  row  echelon  form  by  Gauss-Jordan  elimination,  though  the  complex  arithmetic  is  somewhat  tedious. 
A simpler  procedure  here  is  first  to  observe  that  the  reduced  row  echelon  form  of  9 must  have  a row  of  zeros 
because  8 has  nontrivial  solutions.  This  being  the  case,  each  row  of  9 must  be  a scalar  multiple  of  the  other,  and 
hence  the  first  row  can  be  made  into  a row  of  zeros  by  adding  a suitable  multiple  of  the  second  row  to  it. 
Accordingly,  we  can  simply  set  the  entries  in  the  first  row  to  zero,  then  interchange  the  rows,  and  then  multiply 
the  new  first  row  by  — -i  to  obtain  the  reduced  row  echelon  form 

> H* 


o 


0 0 


Thus,  a general  solution  of  the  system  is 


*1  = (”  5 + 5*)*’  X2  = t 


This  tells  us  that  the  eigenspace  corresponding  to  A = i is  one-dimensional  and  consists  of  all  complex  scalar 
multiples  of  the  basis  vector 


x = 


-2  + ii 

5 5 

1 


(10) 


As  a check,  let  us  confirm  that  /Jx  = ix-  We  obtain 


Ax  = 


-2  -1 

5 2 


-1  + 1 i 

5^5 

1 


5(-f  + Ii)  + 2 


_I_2  i 

5 5 

i 


= zx 


We  could  find  a basis  for  the  eigenspace  corresponding  to  A = — z in  a similar  way,  but  the  work  is  unnecessary, 


since  Theorem  5.3.4  implies  that 


must  be  a basis  for  this  eigenspace.  The  following  computations  confirm  that  x is  an  eigenvector  of  A 
corresponding  to  \ = —j\ 


(ii) 


Since  a number  of  our  subsequent  examples  will  involve  2x2  matrices  with  real  entries,  it  will  be  useful  to  discuss  some 
general  results  about  the  eigenvalues  of  such  matrices.  Observe  first  that  the  characteristic  polynomial  of  the  matrix 

'a  b 


A = 


c d 


is 


det(A I-A)  = 


A — a —b 
—c  A — d 


= (\  — a)(X  — d) -bc  = \2-(a  + d)\±(ad-bc) 


We  can  express  this  in  terms  of  the  trace  and  determinant  of  A as 


det(AZ  -A)  = X2-  \x(A)  A + det(^) 


(12) 


from  which  it  follows  that  the  characteristic  equation  of  A is 

A2-tr(^)A  + det(^)  = 0 (13) 

Now  recall  from  algebra  that  if  ax'*  I bx  4-  c = 0 is  a quadratic  equation  with  real  coefficients,  then  the  discriminant 
b*  — Aac  determines  the  nature  of  the  roots: 

2 

b — Aac  > 0 [ Two  distinct  real  roots  ] 

b — Aac  = 0 [ One  repeated  real  root] 

2 

b — Aac  < 0 [ Two  conjugate  imaginary  roots  ] 

Applying  this  to  13  with  a = 1,  b = — tr(-d),  and  c = det(-d)  yields  the  following  theorem. 


Olga  Taussky-Todd  (1906-1995) 


Olga  Taussky-Todd  was  one  of  the  pioneering  women  in  matrix  analysis  and  the  first  woman 
appointed  to  the  faculty  at  the  California  Institute  of  Technology.  She  worked  at  the  National  Physical  Laboratory  in 
London  during  World  War  II,  where  she  was  assigned  to  study  flutter  in  supersonic  aircraft.  While  there,  she  realized 
that  some  results  about  the  eigenvalues  of  a certain  6x6  complex  matrix  could  be  used  to  answer  key  questions  about 
the  flutter  problem  that  would  otherwise  have  required  laborious  calculation.  After  World  War  II  Olga  Taussky-Todd 
continued  her  work  on  matrix-related  subjects  and  helped  to  draw  many  known  but  disparate  results  about  matrices 
into  the  coherent  subject  that  we  now  call  matrix  theory. 

[Image:  Courtesy  of  the  Archives,  California  Institute  of  Technology ] 


THEOREM  5.3.5 

2 

If  A is  a 2 x 2 matrix  with  real  entries,  then  the  characteristic  equation  of  A is  A — tr(^4)  A + det(^4)  = 0 and 

(a)  A has  two  distinct  real  eigenvalues  if  tr(-d)  — 4 det(.4)  > 0; 

(b)  A has  one  repeated  real  eigenvalue  if  tr(.d)  — 4 det(L4)  = 0; 

(c)  A has  two  complex  conjugate  eigenvalues  if  t r(A)  — 4 det(y4)  < 0. 


EXAMPLE  4 Eigenvalues  of  a 2 x 2 Matrix 


In  each  part,  use  Formula  13  for  the  characteristic  equation  to  find  the  eigenvalues  of 
2 2" 

-1  5_ 

"0  -l' 

1 2 


<b>4l  = 
<c>  A = 


2 3 

-3  2 


Solution 

We  have  tr(A)  = 7 and  det(^4)  = 12,  so  the  characteristic  equation  of  A is 

A2  — 7A+  12  = 0 

Factoring  yields  (A  — 4)  (A  — 3)  = 0,  so  the  eigenvalues  of  A are  A = 4 and  A = 3- 
We  have  tr(.d)  = 2 and  det(.d)  = 1,  so  the  characteristic  equation  of  A is 

A2  — 2A+1  = 0 

2 

Factoring  this  equation  yields  (A  — 1)  = 0,  so  A = 1 is  the  only  eigenvalue  of  ^4;  it  has  algebraic 
multiplicity  2. 

We  have  tr(.d)  = 4 and  det(^4)  = 13,  so  the  characteristic  equation  of  A is 

A2  — 4A+  13  = 0 

Solving  this  equation  by  the  quadratic  formula  yields 


, 4 ± — 4)2  — 4(13)  4 ± 36 

X~  2 "2 

Thus,  the  eigenvalues  of  A are  \ = 2 + 3j  and  A = 2 — 3f 


2±3i 


Symmetric  Matrices  Have  Real  Eigenvalues 

Our  next  result,  which  is  concerned  with  the  eigenvalues  of  real  symmetric  matrices,  is  important  in  a wide  variety  of 
applications.  The  key  to  its  proof  is  to  think  of  a real  symmetric  matrix  as  a complex  matrix  whose  entries  have  an  imaginary 
part  of  zero. 


THEOREM  5.3.6 

If  A is  a real  symmetric  matrix,  then  A has  real  eigenvalues. 


Suppose  that  \ is  an  eigenvalue  of  A and  x is  a corresponding  eigenvector,  where  we  allow  for  the  possibility  that  X is 
complex  and  x is  in  CM.  Thus, 

Ax  = Ax 

where  x * 0-  If we  multiply  both  sides  of  this  equation  by  xJ  and  use  the  fact  that 

Ax  = x^(Ax)  = A jx^xj  = A(x  • x)  = A||x||2 

then  we  obtain 

y _ x^4x 

” M2 

Since  the  denominator  in  this  expression  is  real,  we  can  prove  that  X is  real  by  showing  that 

xTAx  = x.TAx:  (14) 

But,  A is  symmetric  and  has  real  entries,  so  it  follows  from  the  second  equality  in  14  and  properties  of  the  conjugate  that 
x^^lx  = xTAx  = xTAx  = (^lxj7x=  ^4x)7x=  (A-^x)  Tx  = xT ATx  = xTAx 


A Geometric  Interpretation  of  Complex  Eigenvalues 

The  following  theorem  is  the  key  to  understanding  the  geometric  significance  of  complex  eigenvalues  of  real  2x2  matrices. 


THEOREM  5.3.7 

The  eigenvalues  of  the  real  matrix 


-i 


a 


(15) 


(16) 


are  \ = a ± bi-  If  a and  b are  not  both  zero,  then  this  matrix  can  be  factored  as 


' a -b 

>1 

0 ' 

cos$  — sinp 

b a 

0 

smd)  cosd 

where  cp  is  the  angle  from  the  positive  x-axis  to  the  ray  that  joins  the  origin  to  the  point  (a,  b ) (Figure  5.3.2). 


Geometrically,  this  theorem  states  that  multiplication  by  a matrix  of  form  1 5 can  be  viewed  as  a rotation  through  the  angle  cp 
followed  by  a scaling  with  factor  |A|  (Figure  5.3.3). 


9 9 

The  characteristic  equation  of  C is  (A  — a)  A-  b =0  (verify),  from  which  it  follows  that  the  eigenvalues  of  C are 

A = a ± bi-  Assuming  that  a and  b are  not  both  zero,  let  cp  be  the  angle  from  the  positive  x-axis  to  the  ray  that  joins  the  origin 
to  the  point  ( a , b) . The  angle  cp  is  an  argument  of  the  eigenvalue  A = a + bi,  so  we  see  from  Figure  5.3.2  that 


a = |A|cos  $ and  b = |A|sin  6 
It  follows  from  this  that  the  matrix  in  15  can  be  written  as 


'a  -b 

>1  o' 

a b 

w w 

>1  o' 

cos<i  —sin 6 

b a 

0 |A| 

b a 

0 |A| 

smd  cos$ 

_w  w 

The  following  theorem,  whose  proof  is  considered  in  the  exercises,  shows  that  every  real  2x2  matrix  with  complex 
eigenvalues  is  similar  to  a matrix  of  form  15. 


THEOREM  5.3.8 

Let  A be  a real  2x2  matrix  with  complex  eigenvalues  A = a A.  bi  (where  b 0)-  If  x is  an  eigenvector  of  A 
corresponding  to  \ = a—bi,  then  the  matrix  P = j^Re  (x)  Im(x)  J is  invertible  and 


A = P 


a —b 
b a 


>-l 


(17) 


EXAMPLE  5 A Matrix  Factorization  Using  Complex  Eigenvalues 

Factor  the  matrix  in  Example  3 into  form  17  using  the  eigenvalue  \ = — j and  the  corresponding  eigenvector 
that  was  given  in  1 1 . 

For  consistency  with  the  notation  in  Theorem  5.3.8,  let  us  denote  the  eigenvector  in  1 1 that 
corresponds  to  ,\  = — j by  x (rather  than  x as  before).  For  this  X and  x we  have 


Thus, 


so  A can  be  factored  in  form  17  as 


2' 

1 

a = 0,  b=  1,  Re(x)  = 

5 

, Im(x)  = 

5 

1_ 

_ 0 

2 T 

P=[ Re(x)  Im(x)]  = 


5 5 

1 0 


-2  -1 

5 2 


2 _I 
’5  5 
1 0 


0 -1 

1 0 


0 1 

-5  -2 


You  may  want  to  confirm  this  by  multiplying  out  the  right  side. 


A Geometric  Interpretation  of  Theorem  5.3.8 


To  clarify  what  Theorem  5.3.8  says  geometrically,  let  us  denote  the  matrices  on  the  right  side  of  16  by  S and  R,  h respectively, 
and  then  use  16  to  rewrite  17  as 


A = PSR*P-  l=P 


>1 

0 " 

cos$ 

— SITU?) 

0 

iAL 

sinp 

cos$ 

>-l 


(18) 


If  we  now  view  P as  the  transition  matrix  from  the  basis  B = {Re  (x) , Im(x)  } to  the  standard  basis,  then  1 8 tells  us  that 
computing  a product  Axq  can  be  broken  down  into  a three-step  process: 

Step  1 Map  xq  from  standard  coordinates  into  ^-coordinates  by  forming  the  product  p_1Xg. 

Step  2 Rotate  and  scale  the  vector  p_1Xn  by  forming  the  product  SRaP~^X\ q- 

Step  3 Map  the  rotated  and  scaled  vector  back  to  standard  coordinates  to  obtain  = PSRaP^x q- 


Power  Sequences 

There  are  many  problems  in  which  one  is  interested  in  how  successive  applications  of  a matrix  transformation  affect  a specific 
vector.  For  example,  if  A is  the  standard  matrix  for  an  operator  on  Rn  and  xq  is  some  fixed  vector  in  Rn,  then  one  might  be 
interested  in  the  behavior  of  the  power  sequence 

xq,  j4xo>  ^2x0 Ak\.Q, ... 


For  example,  if 


1 2 

2 4 

2 li 

‘5  10 


A = 7 ^ 7 and  xq  = 

then  with  the  help  of  a computer  or  calculator  one  can  show  that  the  first  four  terms  in  the  power  sequence  are 


r 

"1.25" 

„ 2 

1.0' 

. 3 

0.35' 

x0  = 

i 

, 4lxo  = 

0.5 

, A xq  = 

-0.2 

, XQ  = 

-0.82 

With  the  help  of  MATLAB  or  a computer  algebra  system  one  can  show  that  if  the  first  100  terms  are  plotted  as  ordered  pairs 
(x,y),  then  the  points  move  along  the  elliptical  path  shown  in  Figure  5.3.4a. 
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Figure  5.3.4 

To  understand  why  the  points  move  along  an  elliptical  path,  we  will  need  to  examine  the  eigenvalues  and  eigenvectors  of  A. 
We  leave  it  for  you  to  show  that  the  eigenvalues  of  A are  A = 4 i an<^  corresponding  eigenvectors  are 

Al=y-|i:  V1  = (2  + *■  !)  ^ A2  = y + §*:  v2  = ("2  1 j 

If  we  take  A = Aj  = 7-  — yi  and  x = vi  = ^ 4- i,  1 j in  17  and  use  the  fact  that  |A|  = 1 , then  we  obtain  the  factorization 


1 1 

2 4 

3 li 

'5  10 


I 1 
1 0 


'4 

3' 

5 

5 

0 1 
1 -1 

3 

4 

- 

5 

5 

2 _ 

(19) 


R* 


s-l 


where  R, . is  a rotation  about  the  origin  through  the  angle  9 whose  tangent  is 


tan* 


sin  n 


7T  = f (*=“'' f“369' 


cos  $ 4 , 

The  matrix  P in  19  is  the  transition  matrix  from  the  basis 

B=  {Re(x),Im(x)>  ={(±  lj,(1.0) 

to  the  standard  basis,  and  p~ * is  the  transition  matrix  from  the  standard  basis  to  the  basis  B (Figure  5.3.5).  Next,  observe  that 
if  n is  a positive  integer,  then  19  implies  that 

A”x0  = (PR$P  _1 ) "xo  = PR$P  _1x0 

so  the  product  ^4”xq  can  be  computed  by  first  mapping  xq  into  the  point  p_1Xo  in  ^-coordinates,  then  multiplying  by  it”  1° 
rotate  this  point  about  the  origin  through  the  angle  n6,  and  then  multiplying  /#P~*xo  by  P to  map  the  resulting  point  back  to 


standard  coordinates.  We  can  now  see  what  is  happening  geometrically:  In  5-coordinates  each  successive  multiplication  by  A 
causes  the  point  to  advance  through  an  angle  (p,  thereby  tracing  a circular  orbit  about  the  origin.  However,  the  basis  B 

is  skewed  (not  orthogonal),  so  when  the  points  on  the  circular  orbit  are  transformed  back  to  standard  coordinates,  the  effect  is 
to  distort  the  circular  orbit  into  the  elliptical  orbit  traced  by  Anx n (Figure  5.3.4ft).  Here  are  the  computations  for  the  first  step 
(successive  steps  are  illustrated  in  Figure  5.3.4c): 


1 2 

2 4 rr 

_3  11  [l 
5 10 


1 

2 

1 

'1 

2 

1 

1 

2 

1 

’5' 

4 

1 

2 


0 1 
’ 4 


[xq  is  mapped  to  B — coordinates  . ] 


The  point  1 1,  — Jis  rotated  through  the  angle  <f> . 


The  point  | y,  1 Jis  mapped  to  standard  coordinates  . 


Concept  Review 

Real  part  of  z 
Imaginary  part  of  z 
Modulus  of  z 
Complex  conjugate  of  z 
Argument  of  z 
Polar  form  of  z 
Complex  vector  space 
Complex  rc-tuple 
Complex  n-space 
Real  matrix 
Complex  matrix 

Complex  Euclidean  inner  product 
Euclidean  norm  on  C” 


Antisymmetry  property 
Complex  eigenvalue 
Complex  eigenvector 
Eigenspace  in  C” 

Discriminant 

Skills 

Find  the  real  part,  imaginary  part,  and  complex  conjugate  of  a complex  matrix  or  vector. 

Find  the  determinant  of  a complex  matrix. 

Find  complex  inner  products  and  norms  of  complex  vectors. 

Find  the  eigenvalues  and  bases  for  the  eigenspaces  of  complex  matrices. 

Factor  a 2 x 2 real  matrix  with  complex  eigenvalues  into  a product  of  a scaling  matrix  and  a rotation  matrix. 


Exercise  Set  5.3 


In  Exercises  1-2,  find  u,  Re(u),  Im(u),  and  ||u||. 

1.  u=  (2  — i,  4 i,  1+i) 

Answer: 

u=(2  + i,  -4i,  1—0.  Re  (u)  = (2,  0,  1),  Im(u)  = (-1.4,  1),  ||u||  = ^23 

2. u=  (6,  l+4i,  6-20 

In  Exercises  3^1,  show  that  u,  v,  and  k satisfy  Theorem  5.3.1. 

3.  u=  (3  — 4i,  2 + z,  — 6z),  v=  (1  +2,  2 — i,  4),  k = i 

4.  u=  (6,  1 + 4i,  6 — 20,  v=  (4,  3 + 2i,  i — 3),  k = —i 

5.  Solve  the  equation  ix  — 3v  = u for  x,  where  u and  v are  the  vectors  in  Exercise  3. 


Answer: 

x = (7  — 6i,  _4  — 8i,  6 — 120 

6.  Solve  the  equation  (1  + Ox  + 2u  = v for  x,  where  u and  v are  the  vectors  in  Exercise  4. 
In  Exercises  7-8,  find  A,  Re  (.4) , Im(.4) , det(.4) , and  tr(^4) . 


7.a= r-5* 

[2-1  1 


4 

+5  i 


Answer: 

A = 


, Re  (A)  = 


0 4 
2 1 


, Im  {A)  = 


-5  0 
-1  5 


8. 


5i  4 
2 -hi  1 — 

4i  2-3  i 
2 + 3i  1 

9.  Let  A be  the  matrix  given  in  Exercise  7,  and  let  B be  the  matrix 

1 -i 


, det(j4)  = 17  — i,  tr(^4)  = 1 


A = 


3 = 


2i 


Confirm  that  these  matrices  have  the  properties  stated  in  Theorem  5.3.2. 

10.  Let  A be  the  matrix  given  in  Exercise  8,  and  let  B be  the  matrix 

B=\  5l 

_1  -4i_ 

Confirm  that  these  matrices  have  the  properties  stated  in  Theorem  5.3.2. 

In  Exercises  1 1-12,  compute  u ■ v?  u ■ and  v • and  show  that  the  vectors  satisfy  Formula  5 and  parts  ( a ),  ( b ),  and  ( c) 
of  Theorem  5.3.3. 


11.  u=  (i,  2i,  3),  v=  (4,  — 2 i,  1 + *)>  w=  (2  — i,  2 i,  5 4-  3 i),  = 2i 

Answer: 

u • v = — 1 + u • w=  18  —7z,  v • w=  12  + 6i 

12.  u=  (1  + i,  4,  3i),  v=(3,  — 4i,  2 + 3i),  w=  (1  — i,  4i,  4 — 5z), 

13.  Compute  (u  • v)  — w ■ u for  the  vectors  u,  v,  and  w in  Exercise  11. 


Answer: 

— 11  — 14i 


14.  Compute  (m  • w)  4=  (||u||v)  • u for  the  vectors  u,  v,  and  w in  Exercise  12. 
In  Exercises  15-18,  find  the  eigenvalues  and  bases  for  the  eigenspaces  of  A. 


15. 


A = 


4 -5 

1 0 


Answer: 

Aj  = 2 — i,  xi  = 


2 — i 
1 


; A2  = 2 “M,  xi  = 


2 i 

1 


16. 


17. 


A = 


A = 


-1  -5 
4 7 

5 -2 

1 3 


Answer: 

Ai  =4  — x\  = 


1 —i 

1 


; A2  = 4-M,  xi  = 


1+i 

1 


18. 


A = 


8 6 
-3  2 


In  Exercises  19-22,  each  matrix  C has  form  15.  Theorem  5.3.7  implies  that  C is  the  product  of  a scaling  matrix  with  factor 
|A|  and  a rotation  matrix  with  angle  (p.  Find  |A|  and  (p  for  which  — tt  <$<? r. 


19. 


C = 


1 -1 

1 1 


Answer: 


20. 


C = 


0 5 

-5  0 


21. 


C = 


i fA 

-f3  1 


22. 


Answer: 

W = 2.*=-f 
{2  {2 
-{2  {2 


c= 


In  Exercises  23-26,  find  an  invertible  matrix  P and  a matrix  C of  form  15  such  that  A = PCP 


-1 


II 

"-1 

4 

Answer: 

p= 

Csl 

1 

24-a= 

'4 

1 

25-a= 

CO  CO 

1 

J 1 1 

Answer: 

p= 

1 

-1 

p\ 

h* 

II 

■5 

1 

, c = 


3 -2 
2 3 


, C = 


5 -3 
3 5 


27.  Find  all  complex  scalars  k,  if  any,  for  which  u and  v are  orthogonal  in  ■ 

(a)  u=(2i,  i,  3i),  v=(i,6i,k) 

(b)  u = (k,  k,  1 +2),  v=(l,  -1,  1-j) 


Answer: 


(a)  k=  _|; 


(b)  None 

28.  Show  that  if  A is  a real  n x n matrix  and  x is  a column  vector  in  CM,  then  Re  (^4x)  = A (Re  (x) ) and  Im(Ax)  = ^4(Im(x) ) . 

29.  The  matrices 


"0 

r 

"0  -i 

"1 

o' 

ai  = 

1 

0 

, a2  — 

i 0 

> a2 = 

0 

-1 

called  Pauli  spin  matrices , are  used  in  quantum  mechanics  to  study  particle  spin.  The  Dirac  matrices , which  are  also  used 
in  quantum  mechanics,  are  expressed  in  terms  of  the  Pauli  spin  matrices  and  the  2 x 2 identity  matrix  1 2 as 

\h  0 

0 -h 


0 = 


c 


0 <72 

<72  0 


' 0 

7 a*  = 

^1 

0 

1 

' 0 

<73 

, a'z = 

<73 

0 

(a)  Show  that  3 2 = a$  =ny  =n%. 

(b)  Matrices  A and  B for  which  AS  = — SA  are  said to  be  anticommutative.  Show  that  the  Dirac  matrices  are 
anticommutative. 


30.  If  k is  a real  scalar  and  v is  a vector  in  Rn,  then  Theorem  3.2.1  states  that  ||£v||  = |£|||v|| . Is  this  relationship  also  true  if  k 
is  a complex  scalar  and  v is  a vector  in  C”?  Justify  your  answer. 

31.  Prove  part  ( c)  of  Theorem  5.3.1. 

32.  Prove  Theorem  5.3.2. 

33.  Prove  that  if  u and  v are  vectors  in  C”,  then 

u-v=  i||u  + v||2-i||u-v||2 

+ I-||u  + Jv||2-|||u-Jv||2 


34.  It  follows  from  Theorem  5.3.7  that  the  eigenvalues  of  the  rotation  matrix 


R*  = 


cos$ 

sin® 


— sin® 
cos$ 


are  A = costb  ± zsinri.  Prove  that  if  x is  an  eigenvector  corresponding  to  either  eigenvalue,  then  Re(x)  and  Im(x)  are 
orthogonal  and  have  the  same  length.  [Note:  This  implies  that  P = [Re(x)Im(x)  ] is  a real  scalar  multiple  of  an 
orthogonal  matrix.] 


35.  The  two  parts  of  this  exercise  lead  you  through  a proof  of  Theorem  5.3.8. 
(a)  For  notational  simplicity,  let 


and  let  u = Re  (x)  and  v = Im(x) , so  P = [u|v] . Show  that  the  relationship  Ax  = Ax  implies  that 

Ax  = (an  4-  bv ) 4=  i ( — bn  4-  av ) 

and  then  equate  real  and  imaginary  parts  in  this  equation  to  show  that 

AP  = [Au\Av]  = [an  + bv  | — bn  4-  av]  = PM 

(b)  Show  that  P is  invertible,  thereby  completing  the  proof,  since  the  result  in  part  (a)  implies  that  A = PM P -1  • [Hint:  If 
P is  not  invertible,  then  one  of  its  column  vectors  is  a real  scalar  multiple  of  the  other,  say  v = cu-  Substitute  this  into 
the  equations  Ai  = an  + bv  and  Av  = — bn  4-  av  obtained  in  part  (a),  and  show  that  ( 1 + c )&u  = 0.  Finally,  show 
that  this  leads  to  a contradiction,  thereby  proving  that  P is  invertible.] 


36.  In  this  problem  you  will  prove  the  complex  analog  of  the  Cauchy- Schwarz  inequality. 

(a)  Prove:  If  k is  a complex  number,  and  u and  v are  vectors  in  C”,  then 

(u  — &v)  • (u  — kv)  — n ■ u — k(n  • v)  — k(n  • v)  + ££(v  • v) 


(b)  Use  the  result  in  part  (a)  to  prove  that 

0 < u • u - k(n  • v)  — fc(u  • v)  + kk(v  • v) 


(c)  Take  k = (u  • v)  / (v  • v)  in  part  (b)  to  prove  that 

|u-v|<||u||  ||v|| 


True-False  Exercises 

In  parts  (a)-(f)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer, 
(a)  There  is  a real  5x5  matrix  with  no  real  eigenvalues. 


Answer: 


False 

(b)  The  eigenvalues  of  a 2 x 2 complex  matrix  are  the  solutions  of  the  equation  A — tr(^4)A  + det(^4)  = 0. 

Answer: 

True 

(c)  Matrices  that  have  the  same  complex  eigenvalues  with  the  same  algebraic  multiplicities  have  the  same  trace. 
Answer: 

False 

(d)  If  A is  a complex  eigenvalue  of  a real  matrix  A with  a corresponding  complex  eigenvector  v,  then  A is  a complex 
eigenvalue  of  A and  v is  a complex  eigenvector  of  A corresponding  to  A- 

Answer: 

True 

(e)  Every  eigenvalue  of  a complex  symmetric  matrix  is  real. 

Answer: 

False 

(f)  If  a 2 x 2 real  matrix^  has  complex  eigenvalues  and  xq  is  a vector  in  then  the  vectors  xq,  Ax  q,  A^x  q,  Anx  o, 

on  an  ellipse. 

Answer: 

False 
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5.4  Differential  Equations 

Many  laws  of  physics,  chemistry,  biology,  engineering,  and  economics  are  described  in  terms  of  “differential 
equations” — that  is,  equations  involving  functions  and  their  derivatives.  In  this  section  we  will  illustrate  one  way  in 
which  linear  algebra,  eigenvalues  and  eigenvectors  can  be  applied  to  solving  systems  of  differential  equations. 
Calculus  is  a prerequisite  for  this  section. 


Terminology 

Recall  from  calculus  that  a differential  equation  is  an  equation  involving  unknown  functions  and  their  derivatives. 
The  order  of  a differential  equation  is  the  order  of  the  highest  derivative  it  contains.  The  simplest  differential 
equations  are  the  first-order  equations  of  the  form 


y'  = ay  (1) 

where  y = f (x)  is  an  unknown  differentiable  function  to  be  determined,  y =dy  f dx  is  its  derivative,  and  a is  a 
constant.  As  with  most  differential  equations,  this  equation  has  infinitely  many  solutions;  they  are  the  functions  of  the 
form 


y = (2) 

where  c is  an  arbitrary  constant.  That  every  function  of  this  form  is  a solution  of  1 follows  from  the  computation 

yf  = caeax  = ay 

and  that  these  are  the  only  solution  is  shown  in  the  exercises.  Accordingly,  we  call  2 the  general  solution  of  1.  As  an 
example,  the  general  solution  of  the  differential  equation  y * = 5y  is 

y = ce5x  (3) 

Often,  a physical  problem  that  leads  to  a differential  equation  imposes  some  conditions  that  enable  us  to  isolate  one 
particular  solution  from  the  general  solution.  For  example,  if  we  require  that  solution  3 of  the  equation  y = 5 y 
satisfy  the  added  condition 


y(0)  = 6 (4) 

(that  is,  y = 6 when  x = 0),  then  on  substituting  these  values  in  3,  we  obtain  6 = ce~  = c,  from  which  we  conclude 
that 

is  the  only  solution  y1  = 5 y that  satisfies  4. 

A condition  such  as  4,  which  specifies  the  value  of  the  general  solution  at  a point  is  called  an  initial  condition , and 
the  problem  of  solving  a differential  equation  subject  to  an  initial  condition  is  called  an  initial-value  problem. 


First-Order  Linear  Systems 


In  this  section  we  will  be  concerned  with  solving  systems  of  differential  equations  of  the  form 

y'  i = aiLVi  + a\iyi  +— + ainy»i 

y'  2 = a2\y\  + “vyi  +--+  <*2  *7m 

= <*m171  + <*m272  +— + «nny« 

where  y j = / 1 (x)  > y 2 = / 2 (*)  > - - ->  7m  = / m 00  are  functions  to  be  determined,  and  the  aij's  are  constants.  In 
matrix  notation,  5 can  be  written  as 

[j  1 . 

71 

72 

7m 


(5) 


Vf 

'an 

<312 

...  a\n 

y'i 

= 

«21 

<*22 

---  a2 Yl 

y'n 

an  1 

an2 

---  ^nn 

or,  more  briefly  as 


A system  of  differential  equations  of  form  5 is 
called  a first-order  linear  system. 


v’  = Ay 

where  the  notation  yr  denotes  the  vector  obtained  by  differentiating  each  component  of  y. 


(6) 


EXAMPLE  1 Solution  of  a Linear  System  with  Initial  Conditions 

Write  the  following  system  in  matrix  form: 


7*1  = 

371 

y'  2 = 

-272 

(V) 

y'  3 = 

573 

(b)  Solve  the  system. 

Find  a solution  of  the  system  that  satisfies  the  initial  conditions  y\(0)  = 1 ? y 2 (0)  = 4>  anc^ 
73(0)= -2- 


Solution 

(a) 


or 


7i 

'3  0 O' 

71 

72 

= 

0-2  0 

72 

0 0 5 

73 

73 

y = 


3 0 0 

0-2  0 

0 0 5 


(8) 


(9) 


Because  each  equation  in  7 involves  only  one  unknown  function,  we  can  solve  the  equations 
individually.  It  follows  from  2 that  these  solutions  are 


or,  in  matrix  notation, 


71 

= 

_ 3x 
c\e 

72 

= 

—2x 

c 2# 

73 

= 

'yf 

" c{e3x  ~ 

y2 

= 

_ —2x 

c 2# 

y3 

5x 

c 

y = 


From  the  given  initial  conditions,  we  obtain 

1 = 71  (0)  =cie°  = cl 

4 = 72(0)  =c2e°  = C2 
-2  = 73(0)  =c3e°  =C3 
so  the  solution  satisfying  these  conditions  is 


y l=*3',  y2  = 4r2x.  y3=-2e5x 


or,  in  matrix  notation, 


'yf 

' 

y = 

y2 

= 

4e~2x 

y3 

-2e5x 

(10) 


Solution  by  Diagonalization 


What  made  the  system  in  Example  1 easy  to  solve  was  the  fact  that  each  equation  involved  only  one  of  the  unknown 
functions,  so  its  matrix  formulation,  y'  = Ay,  had  a diagonal  coefficient  matrix  A [Formula  9].  A more  complicated 
situation  occurs  when  some  or  all  of  the  equations  in  the  system  involve  more  than  one  of  the  unknown  functions,  for 
in  this  case  the  coefficient  matrix  is  not  diagonal.  Let  us  now  consider  how  we  might  solve  such  a system. 


The  basic  idea  for  solving  a system  y'  = Ay  whose  coefficient  matrix  A is  not  diagonal  is  to  introduce  a new 
unknown  vector  u that  is  related  to  the  unknown  vector  y by  an  equation  of  the  form  y = Pn  in  which  P is  an 
invertible  matrix  that  diagonalizes  A.  Of  course,  such  a matrix  may  or  may  not  exist,  but  if  it  does  then  we  can  rewrite 
the  equation  y;  = Ay  as 

Pu  = A(Pu) 


or  alternatively  as 

u '=fp-iApy 

Since  P is  assumed  to  diagonalize  A,  this  equation  has  the  form 

= Du. 


where  D is  diagonal.  We  can  now  solve  this  equation  for  u using  the  method  of  Example  1,  and  then  obtain  y by 
matrix  multiplication  using  the  relationship  y = Pn. 

In  summary,  we  have  the  following  procedure  for  solving  a system  y;  = Ay  in  the  case  were  A is  diagonalizable. 

r 


A Procedure  for  Solving  yf  = Ay  if  A is  Diagonalizable 

Step  1.  Find  a matrix  P that  diagonalizes  A. 

Step  2.  Make  the  substitutions  y = Pu  and  y*  = Pm'  to  obtain  a new  “diagonal  system”  ur  = where 

D = P~lAP- 

Step  3.  Solve  u'  — £)q. 

Step  4.  Determine  y from  the  equation  y = Pn. 


EXAMPLE  2 Solution  Using  Diagonalization 

(a)  Solve  the  system 

y{  = y\  + yi 

y'2  = 4y  1 - 2^2 

Find  the  solution  that  satisfies  the  initial  conditions  y j (0)  = 1,  y2(0)  = 6- 


Solution 

The  coefficient  matrix  for  the  system  is 


A = 


1 1 

4 -2 


As  discussed  in  Section  5.2,  A will  be  diagonalized  by  any  matrix  P whose  columns  are  linearly 
independent  eigenvectors  of  A.  Since 

A—  1 -1 


det(A/  — A)  = 


-4  A+2 

the  eigenvalues  of  A are  A = 2 and  \=  — 3.  By  definition 


= A + A — 6 = (A  + 3)  (A  — 2) 


x = 


*1 

*2 


is  an  eigenvector  of  A corresponding  to  A if  and  only  if  x is  a nontrivial  solution  of 

A — 1 -1 
w-4  A + 2 

If  A=  2>  this  system  becomes 


■*l' 

'o' 

/2_ 

_0_ 

1 

-V 

■*l' 

0" 

-4 

4 _ 

x2_ 

_0_ 

Solving  this  system  yields  x \ = t,  *2  = £,  so 


Thus, 


PI  = 


is  a basis  for  the  eigenspace  corresponding  to  ,\  = 2-  Similarly,  you  can  show  that 

P2=  4 

1 

is  a basis  for  the  eigenspace  corresponding  to  ,\  = _ 3.  Thus, 

P = 


> 4 
1 


1 


diagonalizes  A,  and 


D = P~XAP  = 


2 0 
0 -3 


Thus,  as  noted  in  Step  2 of  the  procedure  stated  above,  the  substitution 

y = Ai  and  y*  = Air 

yields  the  “diagonal  system” 

2 0 1 u\  = 2wl 

„ -U  or  , 

0 — 3 J u*2  = -3u2 


u =Du  = 


From  2 the  solution  of  this  system  is 


« 1 


— c\e 


2x 


—3x 


or  u = 


U2  =c 

so  the  equation  y = Ai  yields,  as  the  solution  for  y, 


y = 


or 


c\e 


2x 


c 2& 


—3x 


r 

4 

2x 

2x  1 —3* 

yr 



l 

c\e 



c\e  ~ ~^c2e 

y 2 

_i 

i 

1 

* 

7 

CN 

1 

cle2x+c2e~3x 

y\  = c\e2x-^c2e  3x 

2x  3x 

y 2 = c\e  + c2e 


(11) 


If  we  substitute  the  given  initial  conditions  in  1 1 , we  obtain 

c 1“^2=1 
c\  + c2  = 6 

Solving  this  system,  we  obtain  c i = 2,  C2  = 4,  so  it  follows  from  1 1 that  the  solution  satisfying 
the  initial  conditions  is 

y i = 2e  — e 


y 2 = Ze 


2x 


4& 


—3x 


Keep  in  mind  that  the  method  of  Example  2 works  because  the  coefficient  matrix  of  the  system  can  be 
diagonalized.  In  cases  where  this  is  not  so,  other  methods  are  required.  These  are  typically  discussed  in  books 
devoted  to  differential  equations. 


Concept  Review 

Differential  equation 
Order  of  a differential  equation 
General  solution 
Particular  solution 
Initial  condition 
Initial-value  problem 
First-order  linear  system 

Skills 

Find  the  matrix  form  of  a system  of  linear  differential  equations. 

Find  the  general  solution  of  a system  of  linear  differential  equations  by  diagonalization. 

Find  the  particular  solution  of  a system  of  linear  differential  equations  satisfying  an  initial  condition. 


Exercise  Set  5.4 

* • (a)  Solve  the  system 

y[  = y\  + 4y2 

y'2  = 2 yi  + 3 y2 

(b)  Find  the  solution  that  satisfies  the  initial  conditions  yj  (0)  = 0,  y2(0)  — 0- 


Answer: 

(a)  y\  = c\e^x  — 2c2e  x 

y2  = c\^x  2e~% 

(b)  y\  =0 
^2  = 0 


(a)  Solve  the  system 


y[=  y i + 3y2 

y2  = 4y  i + 5^2 

(b)  Find  the  solution  that  satisfies  the  conditions  y j (0)  = 2,  y2(0)  = 1- 
(a)  Solve  the  system 


y[  = 

4y  i 

+ 

73 

y2  = 

-2yi 

+ 72 

yr3  = 

-2yi 

+ 

73 

(b)  Find  the  solution  that  satisfies  the  initial  conditions  yj(0)  = — 1,  y2(0)  — 1’  >'3(0)  — 0- 
Answer: 

(a)  yi  = — c2eAX  4-  c3e3x 
72  = c lgX  + 2c2e7x  — c3e3x 
y3  = 2 c2e7x  — C303x 

(b)  y1=e2x- 2e3x 
y2  = ex-  2e2x  + 2e3x 
y3=  -2e7x  + 2e3x 

4.  Solve  the  system 

y{  = 4y\  + 2y2  + 273 
y2  = 2yi+4y2  + 2y3 
73  = 2y!  + 2y2  + 4y3 

5.  Show  that  every  solution  of  yf  = ay  has  the  form  y = ceax  • 

[Hint:  Let  y = f ( x ) be  a solution  of  the  equation,  and  show  that  / ( x)e  ax  is  constant.] 

6.  Show  that  if  ^ is  diagonalizable  and 

71 

72 

7w 

is  a solution  of  the  system  y*  = Ay , then  each  is  a linear  combination  of  e^xr  e^x , . . e^nX,  where 
A2, A„  are  the  eigenvalues  of  ,4. 

7.  Sometimes  it  is  possible  to  solve  a single  higher-order  linear  differential  equation  with  constant  coefficients  by 
expressing  it  as  a system  and  applying  the  methods  of  this  section.  For  the  differential  equation  yn  — yr  — 6y  = 0 
, show  that  the  substitutions  y\=y  and  y2  = yf  lead  to  the  system 

y[  = y2 
y2  = &y  1 +y2 

Solve  this  system,  and  use  the  result  to  solve  the  original  differential  equation. 


Answer: 


2x  — 7x 

y=c\e*x+c  2& 

8.  Use  the  procedure  in  Exercise  7 to  solve  yn  +yf  — \2y  = 0. 

9.  Explain  how  you  might  use  the  procedure  in  Exercise  7 to  solve  ytn  — 6y  ^ + 1 ly*  — 67  = 0.  Use  your 
procedure  to  solve  the  equation. 


Answer: 

y = c\ex  + C2^x  + cje^x 

1®’  (a)  By  rewriting  1 1 in  matrix  form,  show  that  the  solution  of  the  system  in  Example  2 can  be  expressed  as 


2x 

y = ci«r 

'f 

1 

1 ' 
4 

1 

This  is  called  the  general  solution  of  the  system. 

(b)  Note  that  in  part  (a),  the  vector  in  the  first  term  is  an  eigenvector  corresponding  to  the  eigenvalue  Aj  = 2,  and 
the  vector  in  the  second  term  is  an  eigenvector  corresponding  to  the  eigenvalue  A2  = — 3 . This  is  a special 
case  of  the  following  general  result: 


Theorem.  If  the  coefficient  matrix  A of  the  system  yr  = Ay  is  diagonalizable,  then  the  general 
solution  of  the  system  can  be  expressed  as 

y = ci«?Al*xi  + c2<?A2*x2  + ...  + c„eXnXxn 

where  Aj,  A2, Xn  are  the  eigenvalues  of  A,  and  X;  is  an  eigenvector  of  A corresponding  to  A;  . 


Prove  this  result  by  tracing  through  the  four-step  procedure  preceding  Example  2 with 

'Ai  0 ...  0 

0 A2  ...  0 


D = 


0 0 


A„ 


andP=  [xi|x2|..jcm] 


j 


11.  Consider  the  system  of  differential  equations  y;  = Ay,  where  A is  a 2 x 2 matrix.  For  what  values  of 

^n,  ^12,  a21>  a22  do  the  component  solutions  y i (t),  ^2(0  ten^ to  zero  as  t — ► 00?  In  particular,  what  must  be 
true  about  the  determinant  and  the  trace  of  A for  this  to  happen? 

12.  Solve  the  nondiagonalizable  system 

y[  = y 1 + 72 

72  = 72 


True-False  Exercises 

In  parts  (a)-(e)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer, 
(a)  Every  system  of  differential  equations  y*  = Ay  has  a solution. 


Answer: 


False 

(b)  If  x'  = Ax  and  y*  = Ay,  then  x = y. 

Answer: 

False 

(c)  If  xf  = Ax.  and  yr  = Ay,  then  (cx  + dy) * = A(cx  + dy)  for  all  scalars  c and  d. 

Answer: 

True 

(d)  If  A is  a square  matrix  with  distinct  real  eigenvalues,  then  it  is  possible  to  solve  s!  = Ax  by  diagonalization. 
Answer: 

True 

(e)  If  A and  P are  similar  matrices,  then  yf  = Ay  and  = Ai  have  the  same  solutions. 

Answer: 

False 
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Supplementary  Exercises 


(a)  Show  that  if  0 < 9 < then 


(b) 


A = 


COS# 

sin# 


has  no  eigenvalues  and  consequently  no  eigenvectors. 
Give  a geometric  explanation  of  the  result  in  part  (a). 


—sin# 

cos# 


Answer: 


(b)  The  transformation  rotates  vectors  through  the  angle  #;  therefore,  if  0 < # < jt,  then  no  nonzero  vector 
is  transformed  into  a vector  in  the  same  or  opposite  direction. 


2.  Find  the  eigenvalues  of 


1 0 
0 1 

-3k2  3k 


3-  (a) 


Show  that  if  D is  a diagonal  matrix  with  nonnegative  entries  on  the  main  diagonal,  then  there  is  a 
matrix  S such  that  = £). 


(b)  Show  that  if  A is  a diagonalizable  matrix  with  nonnegative  eigenvalues,  then  there  is  a matrix  S such 
that  S2  = A- 


(c)  Find  a matrix  S such  that  S2  = A,  given  that 


A = 


1 3 
0 4 
0 0 


1 

5 

9 


Answer: 


(c) 


1 

0 

0 


1 0 
2 1 
0 3 


4.  Prove:  If  A is  a square  matrix,  then  A and  A ” have  the  same  characteristic  polynomial. 

5.  Prove:  If  A is  a square  matrix  and  p (A)  = det(A/  — A)  is  the  characteristic  polynomial  of  A,  then  the 
coefficient  of A”-1  in^)(A)  is  the  negative  of  the  trace  of  A. 

6.  Prove:  If  £.  ^ 0,  then 

A = \a  b 
[O  a_ 

is  not  diagonalizable. 

7.  In  advanced  linear  algebra,  one  proves  the  Cayley — Hamilton  Theorem,  which  states  that  a square  matrix 


A satisfies  its  characteristic  equation;  that  is,  if 

cq  + + C2 + ...  + Cn-\ AM  * + AM  = 0 

is  the  characteristic  equation  of  A,  then 

Cq/  -\~  C\A  + C2^  + ...  + Cyi— 1-^  1 = 0 


Verify  this  result  for 


'3  6" 

0 1 0 

(a)  A = 

1 2 

(b)  A = 

0 0 1 
1 -3  3 

In  Exercises  8-10,  use  the  Cayley — Hamilton  Theorem,  stated  in  Exercise  7. 


(a)  Use  Exercise  18  of  Section  5.1  to  prove  the  Cayley — Hamilton  Theorem  for  2 x 2 matrices. 

(b)  Prove  the  Cayley — Hamilton  Theorem  for  n x n diagonalizable  matrices. 


9.  The  Cayley — Hamilton  Theorem  provides  a method  for  calculating  powers  of  a matrix.  For  example,  if  A 
is  a 2 x 2 matrix  with  characteristic  equation 


cq  + c^A  + A^  = 0 


then  Cq/  + cj.(4  + j42  = 0>  so 

A^  = — c\A  — cqI 

Multiplying  through  by  A yields  — _ c\Al  — cqA>  which  expresses  A1'  in  terms  of  A1  and  A,  and 
multiplying  through  by  A1  yields  A4  = —c \A4  — cqA^->  which  expresses  A4  in  terms  of  A'  and  A1- 
Continuing  in  this  way,  we  can  calculate  successive  powers  of  A by  expressing  them  in  terms  of  lower 
powers.  Use  this  procedure  to  calculate  A*,  A^,  A4,  and  A~'  for 


A = 


3 6 
1 2 


Answer: 


r 15  30" 

, ^3= 

'75  150' 

, ^4  = 

"375  750" 

, = 

"1875  3750 

1 — 
Ul 

o 

25  50  _ 

125  250_ 

_ 625  1250 

10.  Use  the  method  of  the  preceding  exercise  to  calculate  A?  and  A4  for 


A = 


0 1 0 
0 0 1 
1 -3  3 


11.  Find  the  eigenvalues  of  the  matrix 


ci  c 2 ...  c„ 
Cl  C2  ...  c„ 

ci  C2  ...  c„ 


Answer: 


0,  tr(^) 

(a)  It  was  shown  in  Exercise  17  of  Section  5.1  that  if  A is  an  n x n matrix,  then  the  coefficient  of  \n  in 
the  characteristic  polynomial  of  A is  1 . (A  polynomial  with  this  property  is  called  monic.)  Show  that 
the  matrix 

0 0 0 ...  0 —eg 

1 0 0 ...  0 -ci 

0 1 0 ...  0 — c2 

0 0 0 ...  1 — cM_ i 

has  characteristic  polynomial 

£>(A)  = eg  + ciA  + ...  + cM_ iAM  ^ + AM 

This  shows  that  every  monic  polynomial  is  the  characteristic  polynomial  of  some  matrix.  The  matrix 
in  this  example  is  called  the  companion  matrix  of  p (A) . [Hint:  Evaluate  all  determinants  in  the 
problem  by  adding  a multiple  of  the  second  row  to  the  first  to  introduce  a zero  at  the  top  of  the  first 
column,  and  then  expanding  by  cofactors  along  the  first  column.] 

(b)  Find  a matrix  with  characteristic  polynomial 

£>(A)  = 1 — 2A  + A2  + 3A3  + A4 

13.  A square  matrix  A is  called  nilpotent  if  An  = 0 for  some  positive  integer  n.  What  can  you  say  about  the 
eigenvalues  of  a nilpotent  matrix? 

Answer: 


They  are  all  0. 

14.  Prove:  If  A is  an  « x « matrix  and  n is  odd,  then  A has  at  least  one  real  eigenvalue. 

15.  Find  a 3 x 3 matrix  A that  has  eigenvalues  A = 0,  1,  and  _ ] with  corresponding  eigenvectors 


0" 

r 

'o' 

1 

7 

-1 

7 

1 

-1 

1 

1 

respectively. 


Answer: 


1 0 0 


16.  Suppose  that  a 4 x 4 matrix  A has  eigenvalues  Aj  = 1 , A2  = — 2,  A3  = 3,  and  A4  = — 3. 

(a)  Use  the  method  of  Exercise  16  of  Section  5. 1 to  find  det(y4) . 

(b)  Use  Exercise  5 above  to  find  tr(^4) . 

17.  Let  A be  a square  matrix  such  that  = A-  What  can  you  say  about  the  eigenvalues  of  A? 


Answer: 


They  are  all  0,  1 , or  _ ] . 

(a)  Solve  the  system 

y[  = 71  + 3^2 

y'2  = 2yi+4y2 

(b)  Find  the  solution  satisfying  the  initial  conditions  y ^ (0)  = 5 and  y?(0)  = 6- 
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| CHAPTER 
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Inner  Product  Spaces 


CHAPTER  CONTENTS 

Inner  Products 

Angle  and  Orthogonality  in  Inner  Product  Spaces 
Gram-Schmidt  Process;  ^-Decomposition 
Best  Approximation;  Least  Squares 
Least  Squares  Fitting  to  Data 
Function  Approximation;  Fourier  Series 


INTRODUCTION 

In  Chapter  3 we  defined  the  dot  product  of  vectors  in  Rn,  and  we  used  that  concept  to 
define  notions  of  length,  angle,  distance,  and  orthogonality.  In  this  chapter  we  will 
generalize  those  ideas  so  they  are  applicable  in  any  vector  space,  not  just  Rn.  We  will  also 
discuss  various  applications  of  these  ideas. 
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6.1  Inner  Products 

In  this  section  we  will  use  the  most  important  properties  of  the  dot  product  on  Rn  as  axioms,  which,  if  satisfied  by  the  vectors 
in  a vector  space  V,  will  enable  us  to  extend  the  notions  of  length,  distance,  angle,  and  perpendicularity  to  general  vector 
spaces. 


General  Inner  Products 

In  Definition  4 of  Section  3.2  we  defined  the  dot  product  of  two  vectors  in  Rn,  and  in  Theorem  3.2.2  we  listed  four 
fundamental  properties  of  such  products.  Our  first  goal  in  this  section  is  to  extend  the  notion  of  a dot  product  to  general  real 
vector  spaces  by  using  those  four  properties  as  axioms.  We  make  the  following  definition. 

Note  that  Definition  1 applies  only  to  real  vector 
spaces.  A definition  of  inner  products  on  complex 
vector  spaces  is  given  in  the  exercises.  Since  we  will 
have  little  need  for  complex  vector  spaces  from  this 
point  on,  you  can  assume  that  all  vector  spaces  under 
discussion  are  real,  even  though  some  of  the  theorems 
are  also  valid  in  complex  vector  spaces. 


1 


DEFINITION  1 

An  inner  product  on  a real  vector  space  Vis  a function  that  associates  a real  number  (u,  vj  with  each  pair  of  vectors  in 
V in  such  a way  that  the  following  axioms  are  satisfied  for  all  vectors  u,  v,  and  w in  V and  all  scalars  k. 

1.  fu,  vj  = (v,  u\  [Symmetry  axiom] 

2.  (u  4-  v,  wj  = (u,  wj  -H  (v,  wj  [Additivity  axiom] 

3.  l\hi,  vj  = k(u,  vj  [Homogeneity  axiom] 

4.  (v,  v}  > 0 and  (v,  vj  = 0 if  and  only  if  v = 0 [Positivity  axiom] 

A real  vector  space  with  an  inner  product  is  called  a real  product  space. 

L J 

Because  the  axioms  for  a real  inner  product  space  are  based  on  properties  of  the  dot  product,  these  inner  product  space  axioms 
will  be  satisfied  automatically  if  we  define  the  inner  product  of  two  vectors  u and  v in  Rn  to  be 

(u,  vj  = u - v = u\v\  4=  U2V2  4=  — 4-  u„vM 

This  inner  product  is  commonly  called  the  Euclidean  inner  product  (or  the  standard  inner  product ) on  Rn  to  distinguish  it 
from  other  possible  inner  products  that  might  be  defined  on  Rn.  We  call  Rn  with  the  Euclidean  inner  product  Euclidean 
n-space. 

Inner  products  can  be  used  to  define  notions  of  norm  and  distance  in  a general  inner  product  space  just  as  we  did  with  dot 
products  in  Rn.  Recall  from  Formulas  11  and  19  of  Section  3.2  that  if  u and  v are  vectors  in  Euclidean  /2-space,  then  norm  and 
distance  can  be  expressed  in  terms  of  the  dot  product  as 

||v||  = |/v  v and  d(u,  v)  = ||u  — v||  = ^(u  — v)  - (u  — v) 

Motivated  by  these  formulas  we  make  the  following  definition. 


n 


DEFINITION  2 


If  Kis  a real  inner  product  space,  then  the  norm  (or  length)  of  a vector  v in  V is  denoted  by  ||v||  and  is  defined  by 

IMI  = i/(v,  v} 

and  the  distance  between  two  vectors  is  denoted  by  d (u,  v)  and  is  defined  by 

d (u,  v)  = 1 1 u — v ||  = — v,  u-  vj 

A vector  of  norm  1 is  called  a unit  vector. 

J 


The  following  theorem,  which  we  state  without  proof,  shows  that  norms  and  distances  in  real  inner  product  spaces  have  many 
of  the  properties  that  you  might  expect. 


THEOREM  6.1.1 

If  u and  v are  vectors  in  a real  inner  product  space  V,  and  if  k is  a scalar,  then: 

(a)  ||v||  >0  with  equality  if  and  only  if  v = 0- 

(b)  ||*v||  = |*|||v||. 

(c)  £3?(ll,  V)  =^(V,  U). 

(d)  d (u,  v)  > 0 with  equality  if  and  only  if  u = y. 


Although  the  Euclidean  inner  product  is  the  most  important  inner  product  on  Rn,  there  are  various  applications  in  which  it  is 
desirable  to  modify  it  by  weighting  each  term  differently.  More  precisely,  if 

are  positive  real  numbers,  which  we  will  call  weights , and  if  u = (u\,u  2, un)  and  v = (vj,  V2, vM)  are  vectors  in  Rn, 
then  it  can  be  shown  that  the  formula 

(u,  v}  = w\U\V\  + W2U2V2  + " ’ ‘ (1) 

defines  an  inner  product  on  Rn  that  we  call  the  weighted  Euclidean  inner  product  with  weights  w 1 , W2,  - 

Note  that  the  standard  Euclidean  inner  product  is  the 
special  case  of  the  weighted  Euclidean  inner  product  in 
which  all  the  weights  are  1 . 


EXAMPLE  1 Weighted  Euclidean  Inner  Product 

Let  u = (u\,tt2)  and  v = (vi,  V2)  be  vectors  in  £2.  Verify  that  the  weighted  Euclidean  inner  product 

(u,  v)  = 3u\v\  + 2^2V2  (2) 


satisfies  the  four  inner  product  axioms. 


Axiom  1 : Interchanging  u and  v in  Formula  2 does  not  change  the  sum  on  the  right  side,  so 


(u,  v}  = (v,  u}. 


Axiom  2:  If  w=  (w\,  W2),  then 

(u  -H  v,  wj 


3(u\  + vi)wi  + 2(^2  + V2)m?2 
3(u\w\  +viwi)  4°  2(^2w2  “H  V2W2) 
(3w  1 w 1 4-  2«2>1?2)  + (3v  iw  1 4s  2v2M?2) 
{ll,  wj  + (jv,  wj 


Axiom  3 : 

{iu,  vj  = 3(i«i)vi  + 2(te2)v2 
= k{3u\v\^-2ii2V2) 

= *{u,  v) 

Axiom  4:  {v,  vj  = 3(vjvi)  + 2(v2V2)  = 3v^  4-  2v|  > 0 with  equality  if  and  only  if  vi  = V2  = 0;  that  is,  if 
and  only  if  v = 0 • 

In  Example  1,  we  are  using  subscripted  w's  to 
denote  the  components  of  thevector  w,  not  the 
weights.  The  weights  are  the  numbers  3 and  2 in 
Formula  2. 


An  Application  of  Weighted  Euclidean  Inner  Products 

To  illustrate  one  way  in  which  a weighted  Euclidean  inner  product  can  arise,  suppose  that  some  physical  experiment  has  n 
possible  numerical  outcomes 

x2 xn 

and  that  a series  of  m repetitions  of  the  experiment  yields  these  values  with  various  frequencies.  Specifically,  suppose  that  x 1 
occurs  / 1 times,  *2  occurs  / 2 times,  and  so  forth.  Since  there  are  a total  of  m repetitions  of  the  experiment,  it  follows  that 

/l  +/2+  • ' • +/n  = m 

Thus,  the  arithmetic  average  of  the  observed  numerical  values  (denoted  by  x)  is 

F_  /l*l  +/2*2+  • • • - 1 /f,r,  , /,r,  r \ , 

If  we  let 

f = (/1./2 /») 

x = (Jri.  JC2 *n) 

m?1  = m?2  = ...=  wn  = 1 / m 

then  3 can  be  expressed  as  the  weighted  Euclidean  inner  product 

x = (f,x}  = wi/1xi+w2/2*2+  ' • • +w„/„x„ 


EXAMPLE  2 Using  a Weighted  Euclidean  Inner  Product 


It  is  important  to  keep  in  mind  that  norm  and  distance  depend  on  the  inner  product  being  used.  If  the  inner  product 
is  changed,  then  the  norms  and  distances  between  vectors  also  change.  For  example,  for  the  vectors  u = (1,  0)  and 
v = (0,  1)  in  p}  with  the  Euclidean  inner  product  we  have 

l|u||  = /l2  + 02=l 

and 

i(u,v)  = ||u-v||  = ||(l,  — 1)11  = /l2  + (— 1)2=/^ 
but  if  we  change  to  the  weighted  Euclidean  inner  product 

{ u,  vj  = 3u\v\  + 2u2V2 


we  have 

INI  = (u,  u}1/2  = [3(1)0)  + 2(0)(0)] 1/2  = 
and 

d{ u.v)  = ||u-v||=((l,  -1),  (1,  -1)}1/2 

= [3(l)(l)  + 2(-l)(-l)]1/2  = /5 


Unit  Circles  and  Spheres  in  Inner  Product  Spaces 

If  V is  an  inner  product  space,  then  the  set  of  points  in  V that  satisfy 

Hull  = i 

is  called  the  unit  sphere  or  sometimes  the  unit  circle  in  V. 

EXAMPLE  3 Unusual  Unit  Circles  in  R2 

Sketch  the  unit  circle  in  an  xy-coordinate  system  in  R2  using  the  Euclidean  inner  product 
(u,  v}  = u\v\  +1*2^2- 

Sketch  the  unit  circle  in  an  xy-coordinate  system  in  R2  using  the  weighted  Euclidean  inner  product 

U,V  =IWlVl  + la2V2. 


Solution 

) If  u = (x,  y )?  then  ||u||  = (u,  u}1/2  = "y2,  so  the  equation  of  the  unit  circle  is  d-y2  = °b  on 

squaring  both  sides, 


As  expected,  the  graph  of  this  equation  is  a circle  of  radius  1 centered  at  the  origin  (Figure  6.1.1  a). 
If  u = (x,  y ),  then  ||u||  = (u,  u}^2  = ^ -i*2  4=  J-y2 , so  the  equation  of  the  unit  circle  is 


i-x2  4=  ^-y2  = 1 , or,  on  squaring  both  sides, 


x2  y 2 

— -h  — = 1 
9 4 


The  graph  of  this  equation  is  the  ellipse  shown  in  Figure  6. 1.1  A 


-t 


(a)  The  unit  circle  using 
the  standard  Euclidean 
inner  product. 


(b)  The  unit  circle  using 
a weighted  Euclidean 
inner  product. 

Figure  6.1.1 


It  may  seem  odd  that  the  “unit  circle”  in  the  second  part  of  the  last  example  turned  out  to  have  an  elliptical  shape. 
This  will  make  more  sense  if  you  think  of  circles  and  spheres  in  general  vector  spaces  algebraically  (||u||  = 1)  rather  than 
geometrically.  The  change  in  geometry  occurs  because  the  norm,  not  being  Euclidean,  has  the  effect  of  distorting  the  space  that 
we  are  used  to  seeing  through  “Euclidean  eyes.” 


Inner  Products  Generated  by  Matrices 

The  Euclidean  inner  product  and  the  weighted  Euclidean  inner  products  are  special  cases  of  a general  class  of  inner  products 
on  Rn  called  matrix  inner  products . To  define  this  class  of  inner  products,  let  u and  v be  vectors  in  Rn  that  are  expressed  in 
column  form , and  let  A be  an  avertible  nxn  matrix.  It  can  be  shown  (Exercise  31)  that  if  u - v is  the  Euclidean  inner  product 
on  Rn,  then  the  formula 


(u,  v}  = An.  • Av 


(4) 


also  defines  an  inner  product;  it  is  called  the  inner  product  on  Rn  generated  by  A. 

Recall  from  Table  1 of  Section  3.2  that  if  u and  v are  in  column  form,  then  u • v can  be  written  as  v^u  from  which  it  follows 
that  4 can  be  expressed  as 

ju,  vj  = (yiv)7'^u 


or,  equivalently  as 


u,v)  = vr^rJ4u 


(5) 


EXAMPLE  4 Matrices  Generating  Weighted  Euclidean  Inner  Products 


The  standard  Euclidean  and  weighted  Euclidean  inner  products  are  examples  of  matrix  inner  products.  The 
standard  Euclidean  inner  product  on  Rn  is  generated  by  the  ^ x n identity  matrix,  since  setting  A = / in  Formula 
4 yields 

(u,  vj  = lu  • /v  = u ■ v 

and  the  weighted  Euclidean  inner  product 


(u,  v)  = w\u\v\  +W2«2v2+  ’ ’ ‘ 


(6) 


is  generated  by  the  matrix 


(7) 


This  can  be  seen  by  first  observing  that  A ^ A is  the  ^ x n diagonal  matrix  whose  diagonal  entries  are  the  weights 
w\,  W2,  - and  then  observing  that  5 simplifies  to  6 when  A is  the  matrix  in  Formula  7. 


EXAMPLE  5 Example  1 Revisited 


Every  diagonal  matrix  with  positive  diagonal 
entries  generates  a weighted  inner  product. 
Why? 


The  weighted  Euclidean  inner  product  (u,  vj  = 3u\v\  { 2z/2v2  discussed  in  Example  1 is  the  inner  product  on 
R2  generated  by 


o 

o {2 


Other  Examples  of  Inner  Products 

So  far,  we  have  only  considered  examples  of  inner  products  on  Rn.  We  will  now  consider  examples  of  inner  products  on  some 
of  the  other  kinds  of  vector  spaces  that  we  discussed  earlier. 

EXAMPLE  6 An  Inner  Product  on  Mnn 


If  U and  V are  ^ x n matrices,  then  the  formula 


(8) 


(y,  r|=tr {uTr'j 


defines  an  inner  product  on  the  vector  space  Mnn  (see  Definition  8 of  Section  1.3  for  a definition  of  trace).  This 
can  be  proved  by  confirming  that  the  four  inner  product  space  axioms  are  satisfied,  but  you  can  visualize  why 
this  is  so  by  computing  8 for  the  2 x 2 matrices 


U = 


a 1 

U2 


U4 


and 


v2 

v4 


This  yields 


Jy,  rj  = tr(t/rr) 


= WlVi  +«2V2  +U3V3  + W4V4 


which  is  just  the  dot  product  of  the  corresponding  entries  in  the  two  matrices.  For  example,  if 


U = 


1 2 
3 4 


and  V = 


-1  0 

3 2 


then 

(U,V)  = !(-!)  + 2(0)  + 3(3)  +4(2)  = 16 


The  norm  of  a matrix  U relative  to  this  inner  product  is 

II  U\\  ={U,U)U2=  ^2  + m2  + u2+w2 


EXAMPLE  7 The  Standard  Inner  Product  on  Pn 

if 

p =<20  +^1*  + - + and  q = &o  + b\x  + ■ - ■ -\-bnxn 

are  polynomials  in  Pn,  then  the  following  formula  defines  an  inner  product  on  Pn  (verify)  that  we  will  call  the 
standard  inner  product  on  this  space: 

(P.  q}  = ^0^0  + + ■ • • -¥anbn  (9) 

The  norm  of  a polynomial  p relative  to  this  inner  product  is 

IIpII  = /{p,p}  = 


EXAMPLE  8 The  Evaluation  Inner  Product  on  Pn 

if 

p = p(x)  =ao  + aix  + • • • +(2„x”  and  q=  q(x)  =bo + bix+  • • • +bnxn 
are  polynomials  in  Pn,  and  if  xq,  xj,  ....  xn  are  distinct  real  numbers  (called  sample  points),  then  the  formula 

(P-  = P(xo)q(xQ)  +p(x\)q(x\)  + • • • + p(x„)q(x„)  (10) 

defines  an  inner  product  on  Pn  called  the  evaluation  inner  product  at  xq,  *i,  Algebraically,  this  can  be 

viewed  as  the  dot  product  in  Rn  of  the  ^-tuples 

(p(xq),p(xi) p(x»))  and  (<?(x0),  q(x\) q(x»)) 

and  hence  the  first  three  inner  product  axioms  follow  from  properties  of  the  dot  product.  The  fourth  inner 
product  axiom  follows  from  the  fact  that 


jp,p}=  b(*o)]2+  [/>(*i)]2  + • • • + bOM)]2>0 

with  equality  holding  if  and  only  if 

?(*o)  =J>(*l)  = - = .POw)  = 0 

But  a nonzero  polynomial  of  degree  n or  less  can  have  at  most  n distinct  roots,  so  it  must  be  that  p = 0,  which 
proves  that  the  fourth  inner  product  axiom  holds. 

The  norm  of  a polynomial  p relative  to  the  evaluation  inner  product  is 

IIpII  = /(p.p}  = i/[/>(>o)]2-i-  b(*i)]2+  • • • + b(*„)]2  (n) 


EXAMPLE  9 Working  with  the  Evaluation  Inner  Product 

Let  P2  have  the  evaluation  inner  product  at  the  points 

*0  = —2,  *i  = 0,  andx2  = 2 

Compute  (p,  qj  and  ||p||  for  the  polynomials  p = p(x)  = x and  q = q(x)  = 1 4-  x. 


It  follows  from  10  and  11  that 

{P.  q}  = P(  - 2 M - 2)  + p(PM0)  +p(2)q(2)  = (4)  ( - 1)  + (0)  (1)  + (4)  (3)  = 8 

IIpII  = /[K*o)]2+[/K*l)]2+[/K*2)]2  = /[^(-2)]2+b(0)]2+b(2)]2 

= l/f42  + 02  + 42  = /32  = 4/2 


CALCULUS  REQUIRED 

EXAMPLE  10  An  Inner  Product  on  C[a,  fo] 

Let  f = f (x)  and  g = g(x)  be  two  functions  in  C[a,b  ] and  define 

/(x)g(x)  dx 


<r-g)=fJ 


(12) 


We  will  show  that  this  formula  defines  an  inner  product  on  C[a,  b]  by  verifying  the  four  inner  product  axioms 
for  functions  f = /(*),  g = g(x),  and  h = h{x)  in  C[a,  b] : 

1. 


(f,  g)=  t /00g00  dx=t g(x)f(x)  dx  = 
J a Ja 


gJ 


which  proves  that  Axiom  1 holds. 


(f  + g,  h}  = J 

- r 


C/to+sOO)*00  dx 


J a 


f(x)k(x)dx  + / g{x)h(x)dx 


Ja 


= (f,h}+(g,ll} 


which  proves  that  Axiom  2 holds. 


3. 


(£f , g } = / kf  O)g(x)  dx=k  f / (x)g(x)  dx  = k 
Ja  Ja 


f\  g 


which  proves  that  Axiom  3 holds. 

4.  Iff  = / ( x ) is  any  function  in  C[a,  &] , then 


2(x)  dx  > 0 


(13) 


'y 

since  f (x)  > 0 for  all  x in  the  interval  [a,  b] . Moreover  because /is  continuous  on  [a,  b],  the  equality 

holds  in  Formula  13  if  and  only  if  the  function /is  identically  zero  on  [a,  b] , that  is,  if  and  only  if  f = Q;  and 
this  proves  that  Axiom  4 holds. 


CALCULUS  REQUIRED 

EXAMPLE  11  Norm  of  a Vector  in  C[a,  5] 


If  C[a,  b]  has  the  inner  product  that  was  defined  in  Example  10,  then  the  norm  of  a function  f = f (x)  relative 
to  this  inner  product  is 


i=(r,r)1'2=i(f 


f\x)  dx 

and  the  unit  sphere  in  this  space  consists  of  all  functions  f in  C[a,  b]  that  satisfy  the  equation 


(14) 


r 


f\x)  dx  = 1 


Note  that  the  vector  space  Pn  is  a subspace  of  C[a,  b]  because  polynomials  are  continuous  functions.  Thus, 
Formula  12  defines  an  inner  product  on  Pn. 

Recall  from  calculus  that  the  arc  length  of  a curve  y =f  (x  ) over  an  interval  [ct7  b]  is  given  by  the  formula 


dx 


(15) 


Do  not  confuse  this  concept  of  arc  length  with  ||f  ||,  which  is  the  length  (norm)  of  f when  f is  viewed  as  a vector  in  C[a,  b]. 
Formulas  14  and  15  are  quite  different. 


Algebraic  Properties  of  Inner  Products 

The  following  theorem  lists  some  of  the  algebraic  properties  of  inner  products  that  follow  from  the  inner  product  axioms.  This 
result  is  a generalization  of  Theorem  3.2.3,  which  applied  only  to  the  dot  product  on  Rn. 


THEOREM  6.1.2 


If  u,  v,  and  w are  vectors  in  a real  inner  product  space  V,  and  if  k is  a scalar,  then 


(a)  (O,v}  = (v,0}  = 0 

(b)  (u,  V + w}  = (u,  vj  + (u,  wj 

(c)  (U,  v-w}  = (u,  v}-(u,w} 

(d)  («  - V,  w}  = (u,  w}  - (v,  w} 

(e)  i(u,  v}  = (u,  kv) 

We  will  prove  part  ( b ) and  leave  the  proofs  ofthe  remaining  parts  as  exercises. 

(u,  v + wj  =(v-fw,  uj,  [By  symmetry] 

= (v,  u}  4-  ijw,  u)  [By  additivity] 

= (u,  v}  4-  (u,  wj  [By  symmetiy] 

The  following  example  illustrates  how  Theorem  6.1.2  and  the  defining  properties  of  inner  products  can  be  used  to  perform 
algebraic  computations  with  inner  products.  As  you  read  through  the  example,  you  will  find  it  instructive  to  justify  the  steps. 

EXAMPLE  12  Calculating  with  Inner  Products 

(u  - 2v,  3u  + 4vJ  = (u,  3u  4-  4v}  - (2v,  3u  4-  4v) 

= (u,  3u}  + (u,  4v}  - (2v,  3uj,  - (2v,  4v} 

= 3{u,  uj  + 4{u,  vj  — 6(v,  uj  — 8{v,  vj 

= 3||u||2  + 4(u,  v}  - 6(u,  v}  - 8||v||2 
= 3||u||2-2(u,v}-8||v||2 


Concept  Review 

Inner  product  axioms 
Euclidean  inner  product 
Euclidean  ?z-space 
Weighted  Euclidean  inner  product 
Unit  circle  (sphere) 

Matrix  inner  product 

Norm  in  an  inner  product  space 

Distance  between  two  vectors  in  an  inner  product  space 
Examples  of  inner  products 
Properties  of  inner  products 

Skills 

Compute  the  inner  product  of  two  vectors. 

Find  the  norm  of  a vector. 

Find  the  distance  between  two  vectors. 


Show  that  a given  formula  defines  an  inner  product. 

Show  that  a given  formula  does  not  define  an  inner  product  by  demonstrating  that  at  least  one  of  the  inner  product 
space  axioms  fails. 


Exercise  Set  6.1 

1.  Let  (u,  vj  be  the  Euclidean  inner  product  on  and  let  u = (1,  1),  v = (3,  2),  w=  (0,  “1),  and  k = 3-  Compute  the 
following. 

(a)  («.▼} 

(b)  (*v.w) 

(c)  (u  + v,w} 

(d)  IMI 

(e)  d (u>  v) 

(f)  ll®-*v|l 

Answer: 

(a)  5 

(b) 

(c)  -3 

(d)  /l3 

(e)  ft 

(f)  {89 

2.  Repeat  Exercise  1 for  the  weighted  Euclidean  inner  product  (u,  vj  = 2u\v\  4 3z^2v2- 

3.  Let  (u,  vj  be  the  Euclidean  inner  product  on  and  let  u = (3,  — 2),  v = (4,  5),  w=  ( — 1,  6),  and  k = — 4-  Verify  the 
following. 

(a)  (u,  v}  = (v,  u} 

(b)  (u  4-  v,  w}  = (u,  w}  + (v,  wj 

(c)  (u,  V + w}  = (u,  v}  + (u,  w} 

(d)  (An,  vj  = k(u,  vj  = (u,  kv} 

(e)  (0,  vj  = (v,  0}  = 0 

Answer: 


(a)  2 

(b)  11 

(c)  -13 

(d)  -8 

(e)  0 


4.  Repeat  Exercise  3 for  the  weighted  Euclidean  inner  product  (u,  vj  = 4u\v\  + 5u2V2- 


' Let  (u,  v1;,  be  the  inner  product  on  generated  by 
following. 


2 1 
1 1 


, and  let  u = (2,  1),  v = ( — 1,  1),  w=  (0,  — 1).  Compute  the 


(a)  («.▼} 

(b)  {v,w} 

(C)  (u-Fv,w) 

(d)  llvll 

(e)  d O. w) 

(f)  ||v-w||2 


Answer: 


(a)  -5 

(b)  1 

(c)  -7 

(d)  1 

(e)  1 

(f>  1 

6 o 

* Repeat  Exercise  5 for  the  inner  product  on  R1  generated  by 
7.  Compute  (u,  vj  using  the  inner  product  in  Example  6. 

(a)u  = 


(b>u  = 


2 -1 


[3  -2' 

-1  3 

00  ' 

, V — 

1 1 

1 2' 

"4  6] 

-3  5 

, V — 

00 

o 

1 

Answer: 

(a)  3 

(b)  56 


8.  Compute  (p,  q';.  using  the  inner  product  in  Example  7. 

(a)  p = — 2 4-  x 4=  3xz,  q = 4 — lxA 

(b)  p = — 5 4-  2x  4-  x*,  q = 3 + 2x  — 4xA 


(a)  Use  Formula  4 to  show  that  (u,  v}  = 9«iV[  4-  4«2V2  is  the  inner  product  on  generated  by 

'3  O' 


A = 


0 2 


(b)  Use  the  inner  product  in  part  (a)  to  compute  (u,  v } if  u = ( — 3,  2)  and  v = (1,7). 
Answer: 

(b)  29 

I®*  (a)  Use  Formula  4 to  show  that 


(u,  v)  = 5«ivi  — u\V2  — «2V1  + 1 0^2v2 


is  the  inner  product  on  p}  generated  by 


A = 


2 1 

-1  3 


(b)  Use  the  inner  product  in  part  (a)  to  compute  (u,  vj  if  u = (0,  — 3)  and  v = (6,  2) . 

11.  Let  u = («i,  U2)  and  v = (vj,  V2) . In  each  part,  the  given  expression  is  an  inner  product  on  g}.  Find  a matrix  that 
generates  it. 

(a)  (u,  v}  = 3w1v1  + 5u2v2 

(b)  («,  vj  = 4«ivi  -+  6u2v2 

Answer: 


(a) 

fi  o 

1 

o 

^1 

(b) 

"2  0 ' 

0 ^6 

12.  Let  P2  have  the  inner  product  in  Example  7.  In  each  part,  find  ||p|| . 

(a)  p = — 2 + 3x  + 2xl 

(b)  P = 4-3x2 


13.  Let  il/22  have  the  inner  product  in  Example  6.  In  each  part,  find  ||^4|| . 


-2  5 

3 6 
0 0 
0 0 


Answer: 


(a)  {lA 

(b)  0 


14.  Let  P2  have  the  inner  product  in  Example  7.  Find  d (p,  q) . 

p = 3 — x + x2, 

15.  Let  M22  have  the  inner  product  in  Example  6.  Find  d (A,  B) . 


(a) 

(b) 


A = 
A = 


2 6 
9 4 
-2  4 
1 0 


,B  = 


,B  = 


-4  7 
1 6_ 
"-5  1 
6 2 


q = 2 4-  5x2 


Answer: 

(a)  /l05 

(b)  ^47 

16.  Let  P2  have  the  inner  product  of  Example  9,  and  letp  = l+  7:  + ^ and  q = 1 — 2x . Compute  the  following. 

(a)  (p.  q} 

(b)  llpll 

(c)  of (p,  q) 

17.  Let  P2  have  the  evaluation  inner  product  at  the  sample  points 

xq  = - 1,  *1  = 0,  x2  = 1,  *3  = 2 


Find  (p,  q}  and  ||p  ||  for  p = x + x3  and  q = 1 + x2. 


Answer: 

(P>  q}  = 50,  ||p||  = 6/3 

18.  In  each  part,  use  the  given  inner  product  on  R 2 to  find  ||w||,  where  w = ( — 1,  3). 

(a)  the  Euclidean  inner  product 

(b)  the  weighted  Euclidean  inner  product  (u,  vj  = 3u\v\  + 2^2V25  where  u = (ti\,  112)  and  v = (vj,  V2) 

(c)  the  inner  product  generated  by  the  matrix 


19.  Use  the  inner  products  in  Exercise  18  to  find  a?(u,  v)  for  u = ( — 1,  2)  and  v = (2,  5). 

Answer: 

(a)  3/2 

(b)  3/5 

(c)  3/l3 

20.  Suppose  that  u,  v,  and  w are  vectors  such  that 

(u,  v}  = 2,  (v,w}=  —3,  (u,  w}  = 5 
Hull  = n IM|  = 2.  INI  =7 

Evaluate  the  given  expression. 

(a)  (u  + v,v-fw} 

(b)  ( 2v  — w,  3u  4°  2wj, 

(u  — v — 2w,  4u  + vj 

(d)  iiu+vii 

(e)  || 2w  — v|| 

(f)  ||u-2v-l  4w|| 

21.  Sketch  the  unit  circle  in  p}  using  the  given  inner  product. 

(a)  |u,  vj  = ^wivi  +-^r«2V2 

(b)  (u,  v}  = 2w1v1  +«2v 2 
Answer: 


(a) 

4 

1 1 

X 

-2 

2 

-4 

(b) 

22.  Find  a weighted  Euclidean  inner  product  on  for  which  the  unit  circle  is  the  ellipse  shown  in  the  accompanying  figure. 

i i-v 

1 

x 


3 


Figure  Ex-22 

23.  Let  u = («i,  w2)  and  v = (vj,  V2).  Show  that  the  following  are  inner  products  on  g}  by  verifying  that  the  inner  product 
axioms  hold. 

(a)  (u,  v}  = 3«ivi  + 5w2V2 

(b)  (u,  v)=4u\v\  + a2vl  +W1V2-F4W2V2 


Answer: 


For  V = 


0 

-1 


, then  fV,  = — 2 < 0,  so  Axiom  4 fails. 


24.  Let  u = («i,  w2>  ^3)  and  v = (vj,  v2,  V3).  Determine  which  of  the  following  are  inner  products  on  g^.  por  those  that  are 
not,  list  the  axioms  that  do  not  hold. 

(a)  (u,  v}=aivi  +w3v3 

(b)  Ju,  vj  = u%y\  4-  u 2v|  + W3V3 

(c)  (u,  v}  = 2^ivi  -h  w2v2  + 4W3V3 

(d)  (u,  v}  = wivi -W2V2  + W3V3 


25.  Show  that  the  following  identity  holds  for  vectors  in  any  inner  product  space. 

||U  + V||2  + ||U  — v||2  = 2||u||2  + 2||v||2 


Answer: 


(a) 

15 

(b)  0 

26.  Show  that  the  following  identity  holds  for  vectors  in  any  inner  product  space. 


27 • Let  U = 


u 1 U2 
U4 


and  V = 


vi  v2 
v3  v4 


u-  v)=  4-Hu  + vII  -^llu-vll^ 


Show  that  lU,  V}  = u\v\  4=  u^y 3 4-  W3V2  4-  W4V4  is  not  an  inner  product  on  il/22- 


28.  Calculus  required  Let  the  vector  space  P2  have  the  inner  product 


p>  q 


-L 


p(x)q(x)  dx 


(a)  Find  ||p||  forp  = 1,  p = *,  andp  = x2. 

(b)  Find  d (p,  q)  if  p = 1 and  q = x. 

29.  Calculus  required  Use  the  inner  product 

f p(x)q{x)  dx 

on  P3,  to  compute  (p,  qj. 

(a)  p = 1 — x 4-  x*  + 5x3,  q = x — 3x 2 

(b)  p = x — 5x3,  q = 2 4-  8x2 


30.  Calculus  required  In  each  part,  use  the  inner  product 


f,  g 


-/ 


/(x)g(x)  rfx 


on  C[0,  1 ] to  compute  {f , g}. 

(a)  f = cos27rx,  g = sin27rx 

(b)  f = x,  g = ex 

(c)  f = tan**,  g = 1 


31.  Prove  that  Formula  4 defines  an  inner  product  on  Rn. 

32.  The  definition  of  a complex  vector  space  was  given  in  the  first  margin  note  in  Section  4. 1 . The  definition  of  a complex 
inner  product  on  a complex  vector  space  V is  identical  to  Definition  1 except  that  scalars  are  allowed  to  be  complex 
numbers,  and  Axiom  1 is  replaced  by  ju,  v|i  = (v,  11} . The  remaining  axioms  are  unchanged.  A complex  vector  space  with 

a complex  inner  product  is  called  a complex  inner  product  space.  Prove  that  if  V is  a complex  inner  product  space  then 
| u,  u,  vj. 


True-False  Exercises 


In  parts  (a)-(g)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  The  dot  product  on  R2  is  an  example  of  a weighted  inner  product. 

Answer: 

True 

(b)  The  inner  product  of  two  vectors  cannot  be  a negative  real  number. 

Answer: 

False 

(c)  (u,  v 4 wj  = (v,  11}  4-  (w,  uj. 

Answer: 

True 

(d)  jiu,  &vj  = £2|u,  vj. 

Answer: 


True 


(e)  If  (u,  vj  = 0,  then  u = Q or  v = 0- 
Answer: 

False 

(i)lf||v||2  = 0,  then  v = 0- 
Answer: 

True 

(g)  If  A is  an  « x n matrix,  then  (u,  vj  = An  -Ay  defines  an  inner  product  on  Rn. 
Answer: 

False 
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6.2  Angle  and  Orthogonality  in  Inner  Product 
Spaces 

In  Section  3.2  we  defined  the  notion  of  “angle”  between  vector  in  Rn.  In  this  section  we  will  extend  this  idea 
to  general  vector  spaces.  This  will  enable  us  to  extend  the  notion  of  orthogonality  as  well,  thereby  setting  the 
groundwork  for  a variety  of  new  applications. 


Cauchy-Schwarz  Inequality 

Recall  from  Formula  20  of  Section  3.2  that  the  angle  Q between  two  vectors  u and  v in  Rn  is 

9=C0S'‘(isiMr)  <» 

We  were  assured  that  this  formula  was  valid  because  it  followed  from  the  Cauchy-Schwarz  inequality 
(Theorem  3.2.4)  that 


- MINI  <2> 

as  required  for  the  inverse  cosine  to  be  defined.  The  following  generalization  of  Theorem  3.2.4  will  enable  us 
to  define  the  angle  between  two  vectors  in  any  real  inner  product  space. 


Cauchy-Schwarz  Inequality 

If  u and  v are  vectors  in  a real  inner  product  space  V,  then 

|(u,  v}|<  |M|||v||  (3) 


We  warn  you  in  advance  that  the  proof  presented  here  depends  on  a clever  trick  that  is  not  easy  to 
motivate. 

In  the  case  where  u = 0 the  two  sides  of  3 are  equal  since  (u,  v\  and  j|u||  are  both  zero.  Thus,  we  need  only 
consider  the  case  where  u ^ 0-  Making  this  assumption,  let 

a = (u,  u),  b = 2{u,  v},  c = (v,  v} 

and  let  t be  any  real  number.  Since  the  positivity  axiom  states  that  the  inner  product  of  any  vector  with  itself  is 
nonnegative,  it  follows  that 

0<^u  + v,  /u  + v}  = ju,  u^2  + 2|u,  vj^  + jv,  vj 

= at 2 4-  bt  + c 


This  inequality  implies  that  the  quadratic  polynomial  at}  I bi  |-  c has  either  no  real  roots  or  a repeated  real 
root.  Therefore,  its  discriminant  must  satisfy  the  inequality  b1  — Aac  < 0-  Expressing  the  coefficients  a , b , 

and  c in  terms  of  the  vectors  u and  v gives  — 4|u,  u jj  v,  vj  < 0 or,  equivalently, 

(u,  v}2  < (u,  U j | V , v) 

Taking  square  roots  of  both  sides  and  using  the  fact  that  (u,  u\  and  (v,  v)  are  nonnegative  yields 


^u,  v 

which  completes  the  proof 


< (u,  u}1/2(v,  orequivalently 


U,  V 


< Hull  llvll 


The  following  two  alternative  forms  of  the  Cauchy-Schwarz  inequality  are  useful  to  know: 

(u,v}2<{  u,u)(v,vj 


(».V)2<||U||2||V||2 

The  first  of  these  formulas  was  obtained  in  the  proof  of  Theorem  6.2.1,  and  the  second  is  a variation  of  the 
first. 


(4) 

(5) 


Angle  Between  Vectors 


Our  next  goal  is  to  define  what  is  meant  by  the  “angle”  between  vectors  in  a real  inner  product  space.  As  the 
first  step,  we  leave  it  for  you  to  use  the  Cauchy-Schwarz  inequality  to  show  that 


u v 


This  being  the  case,  there  is  a unique  angle  0 in  radian  measure  for  which 

cos#  = -ni-rhrf-  ^ O<0<tr 


u Vi 


(Figure  6.2.1).  This  enables  us  to  define  the  angle  0 between  u and  v to  be 


(6) 


(7) 


(8) 


Figure  6.2.1 


EXAMPLE  1 Cosine  of  an  Angle  Between  Two  Vectors  in  R4 

Let  have  the  Euclidean  inner  product.  Find  the  cosine  of  the  angle  (t  between  the  vectors 
u=  (4,  3,  1,  - 2)  andv  = ( - 2,  1,  2,  3). 

We  leave  it  for  you  to  verify  that 

||u||  = /30,  ||v||  = /l8,  and  ju,  vj  = - 9 

from  which  it  follows  that 

cosd=  (u>v)  9 3_ 

IMIIMI  /30/18  2/15 


Properties  of  Length  and  Distance  in  General  Inner  Product  Spaces 

In  Section  3.2  we  used  the  dot  product  to  extend  the  notions  of  length  and  distance  to  R”,  and  we  showed  that 
various  familiar  theorems  remained  valid  (see  Theorem  3.2.5,  Theorem  3.2.6,  and  Theorem  3.2.7).  By  making 
only  minor  adjustments  to  the  proofs  of  those  theorems,  we  can  show  that  they  remain  valid  in  any  real  inner 
product  space.  For  example,  here  is  the  generalization  of  Theorem  3.2.5  (the  triangle  inequalities). 


THEOREM  6.2.2 

If  u,  v,  and  w are  vectors  in  a real  inner  product  space  V,  and  if  k is  any  scalar,  then: 

(a)  ||u  4-  v||  < ||u||  + ||v||  [Triangle  inequality  for  vectors] 

(b)  d (u,  v)  < d (u,  w)  + d (w,  v)  [Triangle  inequality  for  distances] 


Proof  (a) 


l|u  + v||2  = (u  + v,  u + vj 

= (u,  u}  + 2(u,  v}  + (v,  vj 

<(u,  u)  + |(u,  v}|  + {v,  vj  [ Prop erty  of  absolute  value] 

< (u,  u}  + 2||u||  ||v||  + (v,  v}  [By  (3) ] 

= |I»I|2  + 2|M|||v||  + ||y||2 
= (l|u||  + l|v||)2 

Taking  square  roots  gives  ||u  -f  v||  < ||u||  + ||v|| . 


Proof  (b)  Identical  to  the  proof  of  part  ( b ) of  Theorem  3.2.5. 


Orthogonality 

Although  Example  1 is  a useful  mathematical  exercise,  there  is  only  an  occasional  need  to  compute  angles  in 
vector  spaces  other  than  p}  and  p-‘.  A problem  of  more  interest  in  general  vector  spaces  is  ascertaining 
whether  the  angle  between  vectors  is  x / 2-  You  should  be  able  to  see  from  Formula  8 that  if  u and  v are 
nonzero  vectors,  then  the  angle  between  them  is  Q = tj-  / 2 if  and  only  if  (u,  v';,  = 0.  Accordingly,  we  make  the 
following  definition  (which  is  applicable  even  if  one  or  both  of  the  vectors  is  zero). 


DEFINITION  1 

Two  vectors  u and  v in  an  inner  product  space  are  called  orthogonal  if  (u,  vj  = 0. 


As  the  following  example  shows,  orthogonality  depends  on  the  inner  product  in  the  sense  that  for  different 
inner  products  two  vectors  can  be  orthogonal  with  respect  to  one  but  not  the  other. 

EXAMPLE  2 Orthogonality  Depends  on  the  Inner  Product 

The  vectors  u = ( 1 , 1 ) and  v = ( 1 , — 1 ) are  orthogonal  with  respect  to  the  Euclidean  inner 
product  on  p},  since 

u-v=(l)(l)  + (l)(-l)  = 0 

However,  they  are  not  orthogonal  with  respect  to  the  weighted  Euclidean  inner  product 
(u,  vj  = 3«ivi  + 2ti2V2,  since 

(u,  vj  = 3(1)  (1)  + 2(1)  ( — 1)  = 1 * 0 


EXAMPLE  3 Orthogonal  Vectors  in  M22 


If  M 22  has  the  inner  product  of  Example  6 in  the  preceding  section,  then  the  matrices 


U = 


1 0 
1 1 


and 


V = 


0 2 
0 0 


are  orthogonal,  since 

{U,V)  = 1 (0)  + 0(2)  + 1 (0)  + 1 (0)  = 0 


CALCULUS  REQUIRED 


EXAMPLE  4 Orthogonal  Vectors  in  P2 


Let  P2  have  the  inner  product 


and  let  p = x and  q = x . Then 


p,  q 


/: 


P(x)q(x) 


dx 


Because  {P'  *)=0’  the  vectors  p = * and  q = are  orthogonal  relative  to  the  given  inner 
product. 


In  Section  3.3  we  proved  the  Theorem  of  Pythagoras  for  vectors  in  Euclidean  n-space.  The  following  theorem 
extends  this  result  to  vectors  in  any  real  inner  product  space. 

Generalized  Theorem  of  Pythagoras 

If  u and  v are  orthogonal  vectors  in  an  inner  product  space,  then 

||u  + v||2=||u||2+||v||2 

The  orthogonality  of  u and  v implies  that  (u,  vj  = 0,  so 

||u  + v||2  = (u  + V,  u + v}  = ||u||2  + 2(u,  v}  + ||v||2 

= IN2  + INI2 

CALCULUS  REQUIRED 


EXAMPLE  5 Theorem  of  Pythagoras  in  P2 


In  Example  4 we  showed  that  p = x and  q = x“  are  orthogonal  with  respect  to  the  inner  product 

p(x)q(x)  dx 
J- 1 

on  P2.  It  follows  from  Theorem  6.2.3  that 


llp  + q||2  = llp||2  + IWI2 

Thus,  from  the  computations  in  Example  4,  we  have 

„p+,„2=(/D2+(/D2=f+l=# 

We  can  check  this  result  by  direct  integration: 

llp  + q||2  = (p  + q,p  + q}  = J ^ (x  + *2)(x  + a2)cj?x 

= J x2dx  + 2j  x3dx+J  xAdx  = - | + 0 + -|  = -j|- 


Orthogonal  Complements 

In  Section  4.8  we  defined  the  notion  of  an  orthogonal  complement  for  subspaces  of  Rn,  and  we  used  that 
definition  to  establish  a geometric  link  between  the  fundamental  spaces  of  a matrix.  The  following  definition 
extends  that  idea  to  general  inner  product  spaces. 


DEFINITION  2 

If  IE  is  a subspace  of  an  inner  product  space  V,  then  the  set  of  all  vectors  in  V that  are  orthogonal  to 
every  vector  in  W is  called  the  orthogonal  complement  of  W and  is  denoted  by  the  symbol  W 1 • 


J 


In  Theorem  4.8.8  we  stated  three  properties  of  orthogonal  complements  in  Rn . The  following  theorem 
generalizes  parts  (a)  and  (b)  of  that  theorem  to  general  inner  product  spaces. 


THEOREM  6.2.4 


If  IE  is  a subspace  of  an  inner  product  space  V,  then: 

(a)  IV ± is  a subspace  of  V. 

(b)  WnW±=  {0}  . 


The  set  W 1 contains  at  least  the  zero  vector,  since  (0,  wj  = 0 for  every  vector  w in  W.  Thus,  it 
remains  to  show  that  W 1 is  closed  under  addition  and  scalar  multiplication.  To  do  this,  suppose  that  u and  v 
are  vectors  in  W 1 , so  that  for  every  vector  w in  W we  have  (u,  w\  = 0 and  ( v,  wj  = 0.  It  follows  from  the 
additivity  and  homogeneity  axioms  of  inner  products  that 

(u  + v,  wj  = (u,  wj  + (v,  wj  = 0 + 0 = 0 
wj  =k(u,  wj  = £(0)  = 0 
which  proves  that  u | v and  are  in  W 1 • 

Proof (b)  If  v is  any  vector  in  both  W and  W 1 , then  v is  orthogonal  to  itself;  that  is,  f v,  vj  = 0.  It  follows 
from  the  positivity  axiom  for  inner  products  that  v = 0- 

The  next  theorem,  which  we  state  without  proof,  generalizes  part  (c)  of  Theorem  4.8.8.  Note,  however,  that 
this  theorem  applies  only  to  finite-dimensional  inner  product  spaces,  whereas  Theorem  6.2.5  does  not  have 
this  restriction. 


THEOREM  6.2.5 


Theorem  6.2.5  implies  that  in  a finite- 
dimensional inner  product  space 
orthogonal  complements  occur  in  pairs, 
each  being  orthogonal  to  the  other  (Figure 
6.2.2). 

Theorem  6.2.5  If  IF  is  a subspace  of  a finite-dimensional  inner  product  space  V,  then  the  orthogonal 
complement  of  W 1 is  W;  that  is, 

(iv±}J-  = w 


IV-1- 


w 


I 

I 

I 

I 


Each  vector  in  W is  orthogonal  to  each  vector  in  W 


-1 


and  conversely 


Figure  6.2.2 


In  our  study  of  the  fundamental  spaces  of  a matrix  in  Section  4.8  we  showed  that  the  row  space  and  null  space 
of  a matrix  are  orthogonal  complements  with  respect  to  the  Euclidean  inner  product  on  Rn  (Theorem  4.8.9). 
The  following  example  takes  advantage  of  that  fact. 

EXAMPLE  6 Basis  for  an  Orthogonal  Complement 

Let  W be  the  subspace  of/?6  spanned  by  the  vectors 

wi  =(1,3,-  2,  0,  2,  0),  W2  = (2,  6,  - 5,  - 2, 4,  - 3), 

w3  = (0,  0,  5, 10,  0,  15),  w4=  (2,  6,  0,  8, 4,  18) 

Find  a basis  for  the  orthogonal  complement  of  W. 


The  space  W is  the  same  as  the  row  space  of  the  matrix 

'1  3 -2  0 2 O' 

2 6 -5  -2  4 -3 

0 0 5 10  0 15 

2 6 0 8 4 18 


Since  the  row  space  and  null  space  of  A are  orthogonal  complements,  our  problem  reduces  to 
finding  a basis  for  the  null  space  of  this  matrix.  In  Example  4 of  Section  4.7  we  showed  that 


'-3' 

-4" 

’-2' 

1 

0 

0 

0 

-2 

0 

0 

. V2  = 

1 

> V3  = 

0 

0 

0 

1 

0 

0 

0 

form  a basis  for  this  null  space.  Expressing  these  vectors  in  comma-delimited  form  (to  match 
that  of  wi , W2,  W3,  and  W4),  we  obtain  the  basis  vectors 

vi  = (-3,1,  0,0,  0,0),  v2  = (-4.0.  -2. 1,0,0),  v3=  (-2,  0,  0.  0, 1.  0) 

You  may  want  to  check  that  these  vectors  are  orthogonal  to  wj , w2,  w3,  and  W4  by  computing 
the  necessary  dot  products. 


Concept  Review 

Cauchy-Schwarz  inequality 
Angle  between  vectors 
Orthogonal  vectors 
Orthogonal  complement 


Skills 


Find  the  angle  between  two  vectors  in  an  inner  product  space. 

Determine  whether  two  vectors  in  an  inner  product  space  are  orthogonal. 

Find  a basis  for  the  orthogonal  complement  of  a subspace  of  an  inner  product  space. 


Exercise  Set  6.2 

1.  Let  p},  p},  and  p4  have  the  Euclidean  inner  product.  In  each  part,  find  the  cosine  of  the  angle  between  u 
and  v. 

(a)  u=(l,  -3),  v=  (2,  4) 

(b)  *=(-1,0),  v=  (3,  8) 

(c)  u = ( — 1,  5,  2),  v=  (2, 4,  -9) 

(d)  u=(4t  1.8).  v=  (1,  0,  -3) 

(e)  <i=(l,0,  1,0),  v = ( — 3,  -3,  -3,  -3) 

(f)  u = (2, 1, 7,  -1),  v=  (4,  0,0,0) 


Answer: 


(a)  L 

\l 2 

(b) 


^73 


(c)  0 


(d) 

(e) 

(f) 


2.  Let  Pi  have  the  inner  product  in  Example  7 of  Section  6.1  . Find  the  cosine  of  the  angle  between  pand  q. 

(a)  p = — 1 + 5x  + 2x2,  q = 2 + Ax  — 9x2 

(b)  p = x — x2,  q = 7 + 3x  + 3x2 


3.  Let  M 22  have  the  inner  product  in  Example  6 of  Section  6. 1 . Find  the  cosine  of  the  angle  between  A and 

B. 


(a) 

A = 

' 2 6" 

_1  — 3_ 

,B  = 

"3  2" 

_1  0. 

(b) 

A = 

1 1 

I 

— ^ to 

i i 

,B  = 

'-3  1 
4 2 

Answer: 


(a)  « 

10/7 

(b)  0 

4.  In  each  part,  determine  whether  the  given  vectors  are  orthogonal  withrespect  to  the  Euclidean  inner 
product. 

(a)  “=(“1.3,2),  v = (4,  2,  -1) 

(b)  n=(-2.  -2,  -2),  v=  (1,1,1) 

(c)  U=  («1,  U2,  uj),  v=  (0,0,0) 

(d)  u = ( — 4,  6,  -10,1),  v=  (2,  1,  -2,9) 

(e)  u=  (0,  3,  -2,1),  v = (5,  2,  -1,0) 

(f)  u=  (a,  b),  v = ( b,  a) 

'l  'l 

5.  Show  that  p = l—  x A- 2x  and  q = 2x  4-  x“  are  orthogonal  with  respect  to  the  inner  product  in  Exercise 
2. 

6.  Let 


Which  of  the  following  matrices  are  orthogonal  to  A with  respect  to  the  inner  product  in  Exercise  3? 


(a) 

-3  0 

0 2 

(b) 

'1  1 

.0  _1 

(c) 

"0  O' 

.0  0_ 

(d) 

'2  f 

5 2 

7.  Do  there  exist  scalars  k and  l such  that  the  vectors  u=(2,  k,  6),  v = (i,  5, 3),  and  w = (1,  2,  3)  are 
mutually  orthogonal  with  respect  to  the  Euclidean  inner  product? 

Answer: 

No 

8.  Let  R-'  have  the  Euclidean  inner  product,  and  suppose  that  u = (1,  1,  — 1)  and  v = (6,  7,  — 15).  Find  a 
value  of  k for  which  ||&u  | v||  = 13. 

9.  Let  R-'  have  the  Euclidean  inner  product.  For  which  values  of  k are  u and  v orthogonal? 

(a)  u = (2,  1,  3),  v=  (1,  7,  k) 

(b)  n=(*,*,l).  v=(*,5,6) 


Answer: 


(a)  k = - 3 

(b)  * = - 2,  - 3 

10.  Let  £4  have  the  Euclidean  inner  product.  Find  two  unit  vectors  that  are  orthogonal  to  all  three  of  the 
vectors u=  (2,  1,  — 4,  0),v=(  — 1,  — 1,  2,  2),  and  w=  (3,  2,  5,  4). 

11.  In  each  part,  verify  that  the  Cauchy-Schwarz  inequality  holds  for  the  given  vectors  using  the  Euclidean 
inner  product. 

(a)  u=  (3,  2),  v = (4,  — 1) 

(b)  u=(-3.1.0).  v=  (2,  -1,3) 

(c)  u=(-4.2.1).  v=  (8,  -4,  -2) 

(d)  u = (0,  -2,  2,  1),  v = ( - 1,  - 1,  1,  1) 

12.  In  each  part,  verify  that  the  Cauchy-Schwarz  inequality  holds  for  the  given  vectors. 

(a)  u = ( — 2,  1)  and  v = (1,  0)  using  the  inner  product  of  Example  1 of  Section  6.1  . 

using  the  inner  product  in  Example  6 of  Section  6.1  . 

(c)  p = — 1 4-  2x  + x and  q = 2 — 4x  using  the  inner  product  given  in  Example  7 of  Section  6. 1 . 

13.  Let  £4  have  the  Euclidean  inner  product,  and  let  u = ( — 1,  1,  0,  2) . Determine  whether  the  vector  u is 
orthogonal  to  the  subspace  spanned  by  the  vectors  wq  = (0,  0,  0,  0),  W2  = (1,  — 1,  3,  0),  and 

w3  = (4,  0,  9,  2). 

Answer: 

No 

In  Exercises  14-15,  assume  that  Rn  has  the  Euclidean  inner  product. 

14.  Let  Wbe  the  line  in  r}  with  equation  y = 2x-  Find  an  equation  for  W 1 • 

(a)  Let  W be  the  plane  in  R^  w ith  equation  x — 2y  — 3z  = 0-  Find  parametric  equations  for  W 1 ■ 

(b)  Let  Wbe  the  line  in  R-  with  parametric  equations 

x = 2t,  y = — 5t,  z = 4t 

Find  an  equation  for  W 1 • 

(c)  Let  W be  the  intersection  of  the  two  planes 

x+y+z  = 0 and  x — y + z = 0 

in  R Find  an  equation  for  W 1 • 

Answer: 


(a)  x = t,  y = — 2t,  z = — 3t 


(b)  2x  — 5y  + 4z  = 0 

(c)  *-2  = 0 

16.  Find  a basis  for  the  orthogonal  complement  of  the  subspace  of  Rn  spanned  by  the  vectors. 

(a)  vi  = (l,  — 1,  3),  V2  = (5,  -4,  -4),V3  = (7,  -6,2) 

(b)  vi  = (2,  0,  - 1),  V2=  (4,  0,  - 2) 

(c)  vi  = (1, 4,  5,  2),  V2  = (2,  1,  3,  0),  V3  = ( - 1,  3,  2,  2) 

(d)  vi  = (1,4,  5,  6,  9),V2  = (3,  -2,  1,4,  - 1),V3  = ( - 1,  0,  - 1,  -2,  - l),v4=  (2,  3,  5,7,  8) 

17.  Let  Vbe  an  inner  product  space.  Show  that  if  u and  v are  orthogonal  unit  vectors  in  V,  then  ||u  — v||  = ^2 

18.  Let  Vbe  an  inner  product  space.  Show  that  if  w is  orthogonal  to  both  and  U2,  then  it  is  orthogonal  to 
Arjui  4-  A:2u2  f°r  all  scalars  fcq  and  kj-  Interpret  this  result  geometrically  in  the  case  where  V is  R2  with 
the  Euclidean  inner  product. 

19.  Let  Vbe  an  inner  product  space.  Show  that  if  w is  orthogonal  to  each  of  the  vectors  ui,  U2, ...,  ur,  then  it 

is  orthogonal  to  every  vector  in  span  {uj , U2 ur)  . 

20.  Let  { vi , V2, . . .,  vr ) be  a basis  for  an  inner  product  space  V.  Show  that  the  zero  vector  is  the  only  vector 
in  V that  is  orthogonal  to  all  of  the  basis  vectors. 

21.  Let  {wi , W2, . . .,  Wfc ) be  a basis  for  a subspace  W of  V.  Show  that  W 1 consists  of  all  vectors  in  V that  are 
orthogonal  to  every  basis  vector. 

22.  Prove  the  following  generalization  of  Theorem  6.2.3:  If  vj,  V2, ...,  vr  are  pairwise  orthogonal  vectors  in 
an  inner  product  space  V,  then 

||v!+v2+  • • • +vr||2=||v1||2  + ||v2||2+  • • • +K||2 

23.  Prove:  If  u and  v are  « x 1 matrices  and  A is  an  nxn  matrix,  then 

[vTATA*f<  (ur^^u)(v^r^y) 

24.  Use  the  Cauchy-Schwarz  inequality  to  prove  that  for  all  real  values  of  a,  b,  and  &, 

(acosO  + bsm0)  2 < a2  -F  b 2 

25.  Prove:  Ifvt>i,  w>2, ...,  are  positive  real  numbers,  and  if  u = («i,  U2, ....  un)  and  v = (vj,  V2, v„) 
are  any  two  vectors  in  R”,  then 

|wi«ivi +m?2U2v2+  ' ’ 

( 2 2 2 2 2\^^2 
< (vviUj  -FW2«2  + ' • ' +Vt >niin  \ fwiVj  +W2V2  + • • • +wMv„  1 

26.  Show  that  equality  holds  in  the  Cauchy-Schwarz  inequality  if  and  only  if  u and  v are  linearly  dependent. 

27.  Use  vector  methods  to  prove  that  a triangle  that  is  inscribed  in  a circle  so  that  it  has  a diameter  for  a side 
must  be  a right  triangle.  [Hint:  Express  the  vectors  and  gQ  in  the  accompanying  figure  in  terms  of  u 
andv.] 


R 


Figure  Ex-27 

28.  As  illustrated  in  the  accompanying  figure,  the  vectors  u = (l,  {2  ) and  v = ^ - 1,  \[3  j have  norm  2 and 

an  angle  of  60°  between  them  relative  to  the  Euclidean  inner  product.  Find  a weighted  Euclidean  inner 
product  with  respect  to  which  u and  v are  orthogonal  unit  vectors. 

!>' 


(-1W3) 


(i.V5) 


60° 


Figure  Ex-28 


29.  Calculus  required  Let  / (x)  and  g(x)  be  continuous  functions  on  [0,  1 ] . Prove: 

-i2 


(a) 


i: 


/ 00«00  dx 


f f2(x)dx  f g2(x)dx 

Jo  Jo 


(b) 

' rl  0 

1/2 

/•l  - 

1/2 

/■i  0 

/ [f(x)+g(x)]2dx 

Jo 

< 

/ f2(x)dx 

Jo 

+ 

/ g2(x)dx 

Jo 

1/2 


[Hint:  Use  the  Cauchy-Schwarz  inequality.] 

30.  Calculus  required  Let  C[0,  k]  have  the  inner  product 


H L 


f(x)g(x)  dx 


and  let  f n = cosnx  (n  = 0,  1,2,...).  Show  that  if  ^ /,  then  f ^ and  f j are  orthogonal  vectors. 

(a)  Let  W be  the  line  y = x in  an  xy-coordinate  system  in  p/.  Describe  the  subspace  W 1 . 

(b)  Let  If  be  they-axis  in  an  xyz-coordinate  system  in  R-'.  Describe  the  subspace  W 1 . 

(c)  Let  W be  the  yz-plane  of  an  xyz-coordinate  system  in  [>  . Describe  the  subspace  W 1 • 


Answer: 

(a)  The  line  y = — x 

(b)  The  xz-plane 

(c)  The  x-axis 

32.  Prove  that  Formula  4 holds  for  all  nonzero  vectors  u and  v in  an  inner  product  space  V. 


True-False  Exercises 


In  parts  (a)-(f)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  If  u is  orthogonal  to  every  vector  of  a subspace  W,  then  u = 0- 
Answer: 

False 

(b)  If  u is  a vector  in  both  W and  W 1 , then  u = 0- 
Answer: 

True 

(c)  If  u and  v are  vectors  in  W 1 , then  u | v is  in  W 1 • 

Answer: 

True 

(d)  If  u is  a vector  in  W 1 and  A:  is  a real  number,  then  An  is  in  W 1 • 

Answer: 

True 

(e)  If  u and  v are  orthogonal,  then  |(u,  vj|  = ||u|  || v|| . 

Answer: 

False 

(I)  If  u and  v are  orthogonal,  then  ||u  4-  v||  = ||u||  + || v|| . 

Answer: 

False 
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6.3  Gram-Schmidt  Process;  Q/?-Decomposition 

In  many  problems  involving  vector  spaces,  the  problem  solver  is  free  to  choose  any  basis  for  the  vector  space  that 
seems  appropriate.  In  inner  product  spaces,  the  solution  of  a problem  is  often  greatly  simplified  by  choosing  a basis 
in  which  the  vectors  are  orthogonal  to  one  another.  In  this  section  we  will  show  how  such  bases  can  be  obtained. 


Orthogonal  and  Orthonormal  Sets 

Recall  from  Section  6.2  that  two  vectors  in  an  inner  product  space  are  said  to  be  orthogonal  if  their  inner  product  is 
zero.  The  following  definition  extends  the  notion  of  orthogonality  to  sets  of  vectors  in  an  inner  product  space. 


DEFINITION  1 

A set  of  two  or  more  vectors  in  a real  inner  product  space  is  said  to  be  orthogonal  if  all  pairs  of  distinct 
vectors  in  the  set  are  orthogonal.  An  orthogonal  set  in  which  each  vector  has  norm  1 is  said  to  be 
orthogonal. 


J 


EXAMPLE  1 An  Orthogonal  Set  in  R3 

Let 

11  = (0.1.0).  *2  = 0.0.1).  u3  = (1,  0,  — 1) 

and  assume  that  pf  has  the  Euclidean  inner  product.  It  follows  that  the  set  of  vectors 
S=  (uj,  U2,  u3)  is  orthogonal  since  (uj,  U2}  = (ui,  u3J  = (U2,  u3J  = 0. 


If  v is  a nonzero  vector  in  an  inner  product  space,  then  it  follows  from  Theorem  6.1.16  with  k = ||v||  that 


1 


l|v|| 


■▼11  = 


1 


IMI  = 


1 


IMI 


IMI  = 1 


from  which  we  see  that  multiplying  a nonzero  vector  by  the  reciprocal  of  its  norm  produces  a vector  of  norm  1 . This 
process  is  called  normalizing  v.  It  follows  that  any  orthogonal  set  of  nonzero  vectors  can  be  converted  to  an 
orthonormal  set  by  normalizing  each  of  its  vectors. 


EXAMPLE  2 Constructing  an  Orthonormal  Set 


The  Euclidean  norms  of  the  vectors  in  Example  1 are 

IM  = 1,  l|u2||  = {2,  ||u3||  = \[2 

Consequently,  normalizing  uj,  113,  and  113  yields 


VI  = 


- HJ 


lluill 


= (0,1,0),  v2  = 


_ u2  _ 


ll«2ll 


+,0,4= 


{2  ’ {2 


v3  = 


=-S3-  = [4=,o,  -~\=' 


W ’ f2t 


We  leave  it  for  you  to  verify  that  the  set  S = { vj , v2,  V3 } is  orthonormal  by  showing  that 
(VI,  v2}  = (vi,  v3}  = (v2,  v3}  = 0 and  ||vi||  = ||v2||  = ||v3||  = 1 


In  any  two  nonzero  perpendicular  vectors  are  linearly  independent  because  neither  is  a scalar  multiple  of  the 
other;  and  in  any  three  nonzero  mutually  perpendicular  vectors  are  linearly  independent  because  no  one  lies  in 
the  plane  of  the  other  two  (and  hence  is  not  expressible  as  a linear  combination  of  the  other  two).  The  following 
theorem  generalizes  these  observations. 


THEOREM  6.3.1 

If  S = { vi , v2, . . } is  an  orthogonal  set  of  nonzero  vectors  in  an  inner  product  space,  then  S is  linearly 
independent. 


Assume  that 


fcqvi  + £2v2  + • • • -l-£„v„  = 0 (1) 

To  demonstrate  that  S=  (vj,  v2, ...,  vM)  is  linearly  independent,  we  must  prove  that  k\  = k2  = ...  = kn  = 0. 

For  each  v2  in  S,  it  follows  from  1 that 

(/fcivj  + *2v2  + • • • + k„Y„,  V; } = (0,  vi}  = 0 

or,  equivalently, 

kl{vi,Vi)  + k2{v2,Vi)+  • • • +k„{Y„,Yi)  = 0 
From  the  orthogonality  of  S it  follows  that  (v,  , v, } = 0 when  j *i,  so  this  equation  reduces  to 

*h(Vj.  Vj  ) = 0 

Since  the  vectors  in  S are  assumed  to  be  nonzero,  it  follows  from  the  positivity  axiom  for  inner  products  that 
(Vj,  v,-}  * 0.  Thus,  the  preceding  equation  implies  that  each  kj  in  Equation  1 is  zero,  which  is  what  we  wanted  to 
prove. 


Since  an  orthonormal  set  is  orthogonal,  and  since 
its  vectors  are  nonzero  (norm  1),  it  follows  from 
Theorem  6.3.1  that  every  orthonormal  set  is 
linearly  independent. 


In  an  inner  product  space,  a basis  consisting  of  orthonormal  vectors  is  called  an  orthonormal  basis , and  a basis 


consisting  of  orthogonal  vectors  is  called  an  orthogonal  basis.  A familiar  example  of  an  orthonormal  basis  is  the 
standard  basis  for  Rn  with  the  Euclidean  inner  product: 

ei  = (1,  0,  0, 0),  e2  = (0,  1,  0, 0), ....  e„  = (0,  0,  0, 1) 

EXAMPLE  3 An  Orthonormal  Basis 

In  Example  2 we  showed  that  the  vectors 

VI  = (0,1,0),  v2  = (-4=,  0,  and  v3  = (-J=,  0,  - 

1/2  1/2  /2) 

form  an  orthonormal  set  with  respect  to  the  Euclidean  inner  product  on  R-'.  By  Theorem  6.3.1,  these 
vectors  form  a linearlyindependent  set,  and  since  R[ is  three-dimensional,  it  follows  from  Theorem 
4.5.4  that  S = {vj,  v2,  V3}  is  an  orthonormal  basis  for  R-'. 


Coordinates  Relative  to  Orthonormal  Bases 


One  way  to  express  a vector  u as  a linear  combination  of  basis  vectors 

S=  (vi,  v2,...,  v„} 


is  to  convert  the  vector  equation 

u = c\v\  4-  c2v2  4s  • • * 

to  a linear  system  and  solve  for  the  coefficients  c \ , c2, cn.  However,  if  the  basis  happens  to  be  orthogonal  or 
orthonormal,  then  the  following  theorem  shows  that  the  coefficients  can  be  obtained  more  simply  by  computing 
appropriate  inner  products. 


THEOREM  6.3.2 


(a)  If  S = { vi , v2, . . v„ } is  an  orthogonal  basis  for  an  inner  product  space  V,  and  if  u is  any  vector  in  V, 
then 


u = 


(u’v0 

llvill2 


VI  + 


(U.  Y2) 
l|v2||2 


V2+  • • 


f«.  vn) 

2V: 

v„||2 


(2) 


(b)  If  S=  (vi,  V2 vw}  is  an  orthonormal  basis  for  an  inner  product  space  V,  and  if  u is  any  vector  in  V, 

then 


u = (u,  vi}vi  +(u,  V2}V2+  • • • +(u,v„}vM 


(3) 


Since  S = (vi,  V2, vM}  is  a basis  for  V,  every  vector  u in  V can  be  expressed  in  the  form 


u = civi+C2V2+  • • • +c„v„ 


We  will  complete  the  proof  by  showing  that 


c _ (U’VQ 

II V;  II2 

for  i = 1 , 2, To  do  this,  observe  first  that 

(U,V,  } =(C1V1+C2V2+  • • • + C„V„,Vf} 

= ci(vi,  v,  J + C2(V2,  v,  J+  • • • + c„(v„,v,} 

Since  S is  an  orthogonal  set,  all  of  the  inner  products  in  the  last  equality  are  zero  except  the  rth,  so  we  have 

2 


(4) 


(U,  VjJ  =Ci^V1?  v,|  =c,||v, 

Solving  this  equation  for  Cj  yields  4,  which  completes  the  proof. 

Proof  (b)  In  this  case,  Iklll  = ||v2||  =...=  ||v„||  = 1,  so  Formula  2 simplifies  to  Formula  3. 


Using  the  terminology  and  notation  from  Definition  2 of  Section  4.4,  it  follows  from  Theorem  6.3.2  that  the 
coordinate  vector  of  a vector  u in  V relative  to  an  orthogonal  basis  S = (vi,  V2, v„}  is 


(n.  vl)  (”.  v2)  (a. 


llvill2  ’ ||V2||2  ’ ' 


IKir 


(*)s= 

and  relative  to  an  orthonormal  basis  S = { vi , V2, . . } is 

(u),sr=  ((u,  vi },  (u,  v2} (u,  v„}) 


(5) 


(6) 


EXAMPLE  4 A Coordinate  Vector  Relative  to  an  Orthonormal  Basis 

Let 

vi  = (0,1,0),  v2=|-y,0,  |J,  v3=||,  0, 

It  is  easy  to  check  that  S = { vi , V2,  V3 } is  an  orthonormal  basis  for  R-'  with  the  Euclidean  inner  product. 
Express  the  vector  u=(l,l,l)asa  linear  combination  of  the  vectors  in  5,  and  find  the  coordinate  vector 

(u)s- 

We  leave  it  for  you  to  verify  that 

u,vij  = l,  |u,  v2J=  - J,  and  |u,V3j  = ^ 


Therefore,  by  Theorem  6.3.2  we  have 


that  is, 


1 7 

u = V!  --v2  + yv3 


0, 1,  i) = (o,  1,  o)  - 0, !) + 0, 

Thus,  the  coordinate  vector  of  u relative  to  S is 

(u)^=  ({11,  VI },  (u,  v2},  (u,  v3})  = |l,  - J, 


EXAMPLE  5 An  Orthonormal  Basis  from  an  Orthogonal  Basis 


(a)  Show  that  the  vectors 

wi  = (0,  2,  0),  w2  = (3,  0,  3),  w3  = ( — 4,  0,  4) 
form  an  orthogonal  basis  for  £-:  with  the  Euclidean  inner  product,  and  use  that  basis  to  find  an 
orthonormal  basis  by  normalizing  each  vector. 

Express  the  vector  u = (1,  2,  4)  as  a linear  combination  of  the  orthonormal  basis  vectors  obtained 
in  part  (a). 


Solution 

The  given  vectors  form  an  orthogonal  set  since 

(wi,W2}  = 0,  (wi,W3}  = 0,  (W2,W3}  = 0 

It  follows  from  Theorem  6.3.1  that  these  vectors  are  linearly  independent  and  hence  form  a basis 
for  by  Theorem  4.5.4.  We  leave  it  for  you  to  calculate  the  norms  of  w\,  w3,  and  w3  and  then 
obtain  the  orthonormal  basis 


V1  = |.W*.|  = (0,  1,  0),  v3  = -mW2m-  = 

llwill  ^ 1 l|w2|| 


-ko.+l 


ft  ' /2, 


w3 

V3_  l|w3||  - 


_J_  o -L1 
f2  fit 


It  follows  from  Formula  3 that 

u = (u,  vi  }vi  + (u,  V2}V2  + (u,  V3JV3 
We  leave  it  for  you  to  confirm  that 

(u.V!}  =(1,2.  4)  -(0,1.0)  = 2 


(u,v2> 


(u,v3}  =(1,2,4)- 


-+.0.-L 


1/2'  ' fit 


and  hence  that 


(1,  2, 4)  = 2(0,  1,  0)  + -j=  (-)=,  0,  -U  + 4s  (-  -j=,  0,  -±=] 

\2\y2  y2  J 2 \ ^2  y2  J 


Orthogonal  Projections 

Many  applied  problems  are  best  solved  by  working  with  orthogonal  or  orthonormal  basis  vectors.  Such  bases  are 
typically  found  by  starting  with  some  simple  basis  (say  a standard  basis)  and  then  converting  that  basis  into  an 


orthogonal  or  orthonormal  basis.  To  explain  exactly  how  that  is  done  will  require  some  preliminary  ideas  about 
orthogonal  projections. 

In  Section  3.3  we  proved  a result  called  the  Prohection  Theorem  (see  Theorem  3.3.2)  which  dealt  with  the  problem 
of  decomposing  a vector  u in  Rn  into  a sum  of  two  terms,  and  W2,  in  which  is  the  orthogonal  projection  of  u 
on  some  nonzero  vector  a and  W2  is  orthogonal  to  wq  (Figure  3.3.2).  That  result  is  a special  case  of  the  following 
more  general  theorem. 


Projection  Theorem 

If  IF  is  a finite-dimensional  subspace  of  an  inner  product  space  F,then  every  vector  u in  V can  be  expressed 
in  exactly  oneway  as 

u=W!  (7) 

where  is  in  W and  W2  is  in  W 1 • 


The  vectors  «T  and  W2  in  Formula  7 are  commonly  denoted  by 

w\  = projftr  u and  w2  = proj^x  u (8) 

They  are  called  the  orthogonal  projection  of  non  W and  the  orthogonal  projection  of  n on  W 1 , respectively.  The 
vector  W2  is  also  called  the  component  of  u orthogonal  to  W.  Using  the  notation  in  8,  Formula  7 can  be  expressed 
as 


u = projjp  u + proj^r  i u 

(Figure  6.3.1).  Moreover,  since  proj^xu  = u — projj^u,  we  can  also  express  Formula  9 as 

u = projfl/  u + (u  - projft/  u) 


(9) 


(10) 


Wi- 


ll 

projlvi  (i 


0 


proj^u 


r 


vvr 


Figure  6.3.1 


The  following  theorem  provides  formulas  for  calculating  orthogonal  projections. 


THEOREM  6.3.4 


Let  W be  a finite-dimensional  subspace  of  an  inner  product  space  V. 

(a)  If  {vi,  v2, . . vr } is  an  orthogonal  basis  for  W,  and  u is  any  vector  in  V,  then 


projjp  u = 


_ (u.vi)„  , (u,v2) 


VI 


-V2 


(U’  yr) 

•7  vr 

Kll2 


iiviir  iiv2ii 

(b)  If  {vi,  v2, . . vr  } is  an  orthonormal  basis  for  W,  and  u is  any  vector  in  V,  then 

projffru  = (u,  vi}vi  + (u,  v2Jv2+  • • • +fu,vr)vr 


(11) 


(12) 


It  follows  from  Theorem  6.3.3  that  the  vector  u can  be  expressed  in  the  form  u = wj  + w2,  where 
wj  = projjp  u is  in  W and  w2  is  in  W V and  it  follows  from  Theorem  6.3.2  that  the  component  proj^  u = wj  can  be 
expressed  in  terms  of  the  basis  vectors  for  W as 


(wi.vi)  , (wi,v2)  , (wi.v,.) 

projj^u  = wi  = -1 tlvi+-! tlv2+  • • • +-1 ^-Vr 

llvill2  ||V2||2  ||vr||2 

Since  w2  is  orthogonal  to  W,  it  follows  that 

(W2,  Vl } = (W2.  V2}  = ...  = (W2,  Vr)  = 0 

so  we  can  rewrite  13  as 

(wi+w2,  vi)  (W1+W2,  v2)  , , (wi+w2,vr\ 

u = wi=-1 Lvi  4-  -1 *-v2  + • • ■ + -* — L 


projj^u  = wi 
or,  equivalently,  as 


iiviir 


l|v2lt 


(U.V1)  , (u,v2)  , , (o,v,) 

prOJffrU=Wl  = J y— V2  T“  • • • +-1 y“Vj. 

llvill2  ||V2||2  l|vr||2 

In  this  case,  llvill  = ||v2||=...=  ||vr||  = l , so  Formula  13  simplifies  to  Formula  12. 


(13) 


EXAMPLE  6 Calculating  Projections 

Let  p}  have  the  Euclidean  inner  product,  and  let  IF  be  the  subspace  spanned  by  the  orthonormal 
vectors  vi  = (0,  1,  0)  and  v2  = | 0,  J.  From  Formula  12  the  orthogonal  projection  of 

u = (1,  1,  1)  on  IF is 

projjpu  =(u,  vijvi +(u,  v2}v2 

= (1)(0,1,0 ) + 0.|) 

= (A.  i _J_ 

\25’  ’ 25 

The  component  of  u orthogonal  to  W is 


projfp  ‘ u = u — projfj?u=  (1,  1,  1)  - 1,  - ^ j = ||j,  0,  ||j 

Observe  that  proj^x  u is  orthogonal  to  both  vi  and  V2,  so  this  vector  is  orthogonal  to  each  vector  in 
the  space  W spanned  by  v\  and  V2,  as  it  should  be. 


A Geometric  Interpretation  of  Orthogonal  Projections 


If  IT  is  a one-dimensional  subspace  of  an  inner  product  space  V,  say  span  { a)  , then  Formula  1 1 has  only  the  one 
term 


proj^u  = 


(u>  a) 

J 9 a 

llal|2 


In  the  special  case  where  V is  with  the  Euclidean  inner  product,  this  is  exactly  Formula  10  of  Section  3.3  for  the 
orthogonal  projection  of  u along  a.  This  suggests  that  we  can  think  of  1 1 as  the  sum  of  orthogonal  projections  on 
“axes”  determined  by  the  basis  vectors  for  the  subspace  W (Figure  6.3.2). 
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Figure  6.3.2 


The  Gram-Schmidt  Process 

We  have  seen  that  orthonormal  bases  exhibit  a variety  of  useful  properties.  Our  next  theorem,  which  is  the  main 
result  in  this  section,  shows  that  every  nonzero  finite-dimensional  vector  space  has  an  orthonormal  basis.  The  proof 
of  this  result  is  extremely  important,  since  it  provides  an  algorithm,  or  method,  for  converting  an  arbitrary  basis  into 
an  orthonormal  basis. 


THEOREM  6.3.5 

Every  nonzero  finite-dimensional  inner  product  space  has  an  orthonormal  basis. 


Let  W be  any  nonzero  finite-dimensional  subspace  of  an  inner  product  space,  and  suppose  that 
{ui,  U2, ur)  is  any  basis  for  W.  It  suffices  to  show  that  W has  an  orthogonal  basis,  since  the  vectors  in  that  basis 
can  be  normalized  to  obtain  an  orthonormal  basis.  The  following  sequence  of  steps  will  produce  an  orthogonal  basis 
{vi,  V2. ....  Vr}  for  W\ 


Let  vi  =ui. 


As  illustrated  in  Figure  6.3.3,  we  can  obtain  a vector  V2  that  is  orthogonal  to  by  computing  the 
component  of  U2  that  is  orthogonal  to  the  space  W\  spanned  by  vi . Using  Formula  1 1 to  perform  this 
computation  we  obtain 

(U2.  v0 

v2  — u2  “ projjp,  U2  = U2  - 

llvill2 


Of  course,  if  V2  = 0,  then  V2  is  not  a basis  vector.  But  this  cannot  happen,  since  it  would  then  follow  from 
the  above  formula  for  V2  that 


*2  = 


(»2,  vi ) 

llvill2 


VI  = 


fu2>  vl) 

J , Ul 

lluill2 


which  implies  that  U2  is  a multiple  of  contradicting  the  linear  independence  of  the  basis 
S=  {ui,u2,— ,u„)  . 


v2  = u2  - prc>j  H>.  u2 


V1  Pr°J^u2 

Figure  6.3.3 

To  construct  a vector  V3  that  is  orthogonal  to  both  vi  and  V2,  we  compute  the  component  of  113  orthogonal 
to  the  space  Wj  spanned  by  vi  and  V2  (Figure  6.3.4).  Using  Formula  11  to  perform  this  computation  we 
obtain 


v3  = U3  — projjp2  U3  = U3 


= ni  (”3>Yl)Vi  (U3-v2) 


llvill' 


iiv2ir 


-v2 


As  in  Step  2,  the  linear  independence  of  {uj,  U2, uM)  ensures  that  V3  0.  We  leave  the  details  for  you. 


v3  = u3  - proiwr  u, 

\ 


IV, 


fx°iw2ui 


Figure  6.3.4 


To  determine  a vector  V4  that  is  orthogonal  to  vi , V2,  and  V3,  we  compute  the  component  of  U4  orthogonal 
to  the  space  W 3 spanned  by  v^,  V2,  and  V3.  From  11, 


V4  = U4  — proj^3  U4  = U4 


_ (U4’IV, V0V1  _ (tt4’-V2U  _ (U4' V3) 


iiviir 


iiv2ir 


V2 


iiv3ir 


-v3 


Continuing  in  this  way  we  will  produce  an  orthogonal  set  of  vectors  {vi,  v2, vr}  after  r steps.  Since  orthogonal 
sets  are  linearly  independent,  this  set  will  be  an  orthogonal  basis  for  the  r-dimensional  space  W.  By  normalizing 
these  basis  vectors  we  can  obtain  an  orthonormal  basis. 


The  step-by-step  construction  of  an  orthogonal  (or  orthonormal)  basis  given  in  the  foregoing  proof  is  called  the 
Gram-Schmidt process.  For  reference,  we  provide  the  following  summary  of  the  steps. 


The  Gram-Schmidt  Process 


To  convert  a basis  {uj,  112, .. 
computations: 

Step  1.  yl=ul 


Step  2. 
Step  3. 


v2  — u2  “ 


v3  — u3  “ 


f»2.  VQ 

llvill2 
f”3.  vl) 
llvill2 


Step  4.  (u4>  V1 ) 

v4  = u4  - -* , 

llvill2 


ur)  into  an  orthogonal  basis  {vj,V2, ..., 


vi 


vi 


•VI 


(”3.  v2) 

l|v2||2 

(U4.  v2) 

l|v2||2 


V2 


■V2 


(U4,  V3) 

J 9 v3 

l|v3||2 


vr  ) , perform  the  following 


(continue  for  r steps) 

Optional  Step.  To  convert  the  orthogonal  basis  into  an  orthonormal  basis  {qi,  q2>  *ir)  > normalize  the 
orthogonal  basis  vectors. 

L J 


EXAMPLE  7 Using  the  Gram-Schmidt  Process 

Assume  that  the  vector  space  R-'  has  the  Euclidean  inner  product.  Apply  the  Gram-Schmidt  process 
to  transform  the  basis  vectors 

111  = (1,1,1),  112=  (0,1,1),  113  = (0,0,1) 

into  an  orthogonal  basis  {vi,  V2,  V3}  , and  then  normalize  the  orthogonal  basis  vectors  to  obtain  an 
orthonormal  basis  (qi,  q2,  q3)  . 

Solution 

Step  1.  vi  =111  = (1,  1,  1) 

Step  2.  (u2,  vi } 

v2  = u2  - projfp,  u2  = u2  - -1 

llvill2 

= (0,1,1) = 1) 


Step  3. 


v3 


u3  - proj^2  u3  = u3 


-n,  (u3-  vl)  __  (u3,V2) 


ii^iir  M\d 

= (o.  o,  i)  - j(i,  i,  i)  — jtj(-  j.  j) 

(°,-2-2j 


-V2 


Thus, 


V,  = 0,1.1),  v2=(-|,il).  v3=(o,-i,i) 

form  an  orthogonal  basis  for  The  norms  of  these  vectors  are 

l|vil|  = /3,  ||v2||  = 


= #,  l|V3ll  = -j= 
3 /2 


so  an  orthonormal  basis  for  p is 


qi  = 


VI  / 1 1 1 


ll^lll  [fi-  ft  /3 


q2 


_ _V2__  f 2 1 1_ 


l|v2ll 


^6  ^6  ^6  J 


q3 


= _X3_  = [o  __L  _L 

Hv3ll  [’  f2  f2) 


In  the  last  example  we  normalized  at  the  end  to  convert  the  orthogonal  basis  into  an  orthonormal  basis. 
Alternatively,  we  could  have  normalized  each  orthogonal  basis  vector  as  soon  as  it  was  obtained,  thereby  producing 
an  orthonormal  basis  step  by  step.  However,  that  procedure  generally  has  the  disadvantage  in  hand  calculation  of 
producing  more  square  roots  to  manipulate.  A more  useful  variation  is  to  “scale”  the  orthogonal  basis  vectors  at 
each  step  to  eliminate  some  of  the  fractions.  For  example,  after  Step  2 above,  we  could  have  multiplied  by  3 to 
produce  ( — 2,  1,  1)  as  the  second  orthogonal  basis  vector,  thereby  simplifying  the  calculations  in  Step  3. 


Erhardt  Schmidt  (1875-1959) 

Schmidt  wasa  German  mathematician  who  studied  for  his  doctoral  degree  at  Gottingen 
University  under  David  Hilbert,  one  of  the  giants  of  modern  mathematics.  For  most  of  his  life  he  taught  at 
Berlin  University  where,  in  addition  to  making  important  contributions  to  many  branches  of  mathematics, 
he  fashioned  some  of  Hilbert's  ideas  into  a general  concept,  called  a Hilbert  space — a fundamental  idea  in 


the  study  of  infinite-dimensional  vector  spaces.He  first  described  the  process  that  bears  his  name  in  a paper 
on  integral  equations  that  he  published  in  1907. 

[Image:  Archives  of  the  Mathematisches  Forschungsinst\ 


Jorgen  Pederson  Germ  (1850-1916) 


Gram  was  a Danish  actuary  whose  early  education  was  at  village  schools 


supplementedby  private  tutoring.  He  obtained  a doctorate  degree  in  mathematics  while  working  for  the 
Hafnia  Life  Insurance  Company,  where  he  specialized  in  the  mathematics  of  accident  insurance.lt  was  in  his 
dissertation  that  his  contributions  to  the  Gram-Schmidt  process  were  formulated.  He  eventually  became 
interested  in  abstract  mathematics  and  received  a gold  medal  from  the  Royal  Danish  Society  of  Sciences 
and  Letters  in  recognition  of  his  work.  His  lifelong  interest  in  applied  mathematics  never  wavered,  however, 
and  he  produced  a variety  of  treatises  on  Danish  forest  management. 

[Image:  wikipedia] 


CALCULUS  REQUIRED 


EXAMPLE  8 Legendre  Polynomials 


Let  the  vector  space  P2  have  the  inner  product 


Apply  the  Gram-Schmidt  process  to  transform  the  standard  basis  -j  1,  j-  for  P2  into  an 
orthogonal  basis  {$1  (x) , 62 (*)  ,<j>3 (*) } • 


Take  m = 1,  U2  = x,  and 
Step  1.  vi=U!  = l 

We  have 


(u2,  vi}=  J ^ xdx  = 0 


so 


tep  3 We  have 


so 


(«2.  vl) 

= in 1 Lvi  = n o = 


v2  — u2 


VI  =\12=X 


xz  dx  = 


(U3’vi}=/_i 

(„3,v2}  = y_i 

iiviii2=(vi,  vi}=y  ^ 


-I  i 


j-i 

-I  i 


x3  = ^r- 

4 


J-l 


= 0 

-.1 


1 dx  = x 


= 2 


J-l 


fu3,vi)  (u3,  v2}  2 1 

= 1 “”V  \ — 1 _ v o = x — — 


v3  = u3 


iiviir 


iiv2ir 


v2  = x 


Thus,  we  have  obtained  the  orthogonal  basis  (x) , d>2 (*)  ? $3 (*) } in  which 

&(*)  = !.  d2(x)=x,  63(x)=x2-^ 


The  orthogonal  basis  vectors  in  the  foregoing  example  are  often  scaled  so  all  three  functions  have  a value 
of  1 at  x = 1 • The  resulting  polynomials 


'■  *•  it3*2-1) 

which  are  known  as  the  first  three  Legendre  polynomials , play  an  important  role  in  a variety  of  applications.  The 
scaling  does  not  affect  the  orthogonality. 


Extending  Orthonormal  Sets  to  Orthonormal  Bases 

Recall  from  part  ( b ) of  Theorem  4.5.5  that  a linearly  independent  set  in  a finite-dimensional  vector  space  can  be 
enlarged  to  a basis  by  adding  appropriate  vectors.  The  following  theorem  is  an  analog  of  that  result  for  orthogonal 
and  orthonormal  sets  in  finite-dimensional  inner  product  spaces. 


THEOREM  6.3.6 

If  IT  is  a finite-dimensional  inner  product  space,  then: 

(a)  Every  orthogonal  set  of  nonzero  vectors  in  W can  be  enlarged  to  an  orthogonal  basis  for  W. 

(b)  Every  orthonormal  set  in  W can  be  enlarged  to  an  orthonormal  basis  for  W. 


We  will  prove  part  ( b ) and  leave  part  ( a ) as  an  exercise. 

Suppose  that  S = (vj,  V2, v5}  is  an  orthonormal  set  of  vectors  in  W.  Part  ( b ) of  Theorem  4.5.5  tells 
us  that  we  can  enlarge  S to  some  basis 


s”=  (vlt  v2 v5,  VJ+1 V*) 

for  W.  If  we  now  apply  the  Gram-Schmidt  process  to  the  set  then  the  vectors  v\,  V2, v5,  will  not  be  affected 
since  they  are  already  orthonormal,  and  the  resulting  set 

s"=  (vi,  V2 v5,  V5+l vk) 

will  be  an  orthonormal  basis  for  W. 


OPTIONAL 

QR-Decomposition 

In  recent  years  a numerical  algorithm  based  on  the  Gram-Schmidt  process,  and  known  as  QR-decomposition , has 
assumed  growing  importance  as  the  mathematical  foundation  for  a wide  variety  of  numerical  algorithms,  including 
those  for  computing  eigenvalues  of  large  matrices.  The  technical  aspects  of  such  algorithms  are  discussed  in 
textbooks  that  specialize  in  the  numerical  aspects  of  linear  algebra.  However,  we  will  discuss  some  of  the 
underlying  ideas  here.  We  begin  by  posing  the  following  problem. 


Problem 

If  A is  an  m x n matrix  with  linearly  independent  column  vectors,  and  if  Q is  the  matrix  that  results  by 
applying  the  Gram-Schmidt  process  to  the  column  vectors  of  A,  what  relationship,  if  any,  exists  between  A 
and  QI 


To  solve  this  problem,  suppose  that  the  column  vectors  of  A are  , 112, . . u„  and  the  orthonormal  column  vectors 
of  Q are  qj,  q2, qM.  Thus,  A and  Q can  be  written  in  partitioned  form  as 

A = [ui|u2|  — |u„]  and  Q = [qi |q2| ---  |qn] 

It  follows  from  Theorem  6.3.2  b that  ui,  112, u„  are  expressible  in  terms  of  the  vectors  qj,  q2, q„  as 


«i  = (ui>  qi  }qi 

+ 

(®1.  92}q2 

+ ‘ • 

‘ + 

(«i.  q«}q« 

u2  = (u2>  qi  }qi 

+ 

(«2.  q2}q2 

+ ’ ' 

• + 

(u2>  q«}q« 

5 

. 5 

£ 

II 

* 

+ 

(®m.  «2}q2 

+ • ■ 

. ^ 

(«».  q«}q« 

Recalling  from  Section  1.3  (Example  9)  that  the yth  column  vector  of  amatrix  product  is  a linear  combination  of  the 
column  vectors  of  the  first  factor  with  coefficients  coming  from  the  jth  column  of  the  second  factor,  it  follows  that 
these  relationships  can  be  expressed  in  matrix  form  as 


(«i.  qi } 

(u2>  qi } - 

- (uM.  qi} 

(«i,  q2} 

(u2>  q2 } 

- (uM,  q2} 

(®i.  q«} 

(u2>  q«}  - 

••  (««.  q«} 

[ui|u2|...  |u„]  = [qi|q2| ...  |q„] 


or  more  briefly  as 


A = QR  (14) 

where  R is  the  second  factor  in  the  product.  However,  it  is  a property  of  the  Gram-Schmidt  process  that  for  j > 2, 
the  vector  <1;  is  orthogonal  to  uq , 112, . . u ? _i . Thus,  all  entries  below  the  main  diagonal  of  R are  zero,  and  R has  the 
form 

(«i, 1 

° 

0 

We  leave  it  for  you  to  show  that  R is  invertible  by  showing  that  its  diagonal  entries  are  nonzero.  Thus,  Equation  14 
is  a factorization  of  A into  the  product  of  a matrix  Q with  orthonormal  column  vectors  and  an  invertible  upper 
triangular  matrix  R.  We  call  Equation  14  the  QR-decomposition  of  A.  In  summary,  we  have  the  following  theorem. 


} (u2>  11 } ■ ■ • (um.  qi} 

(u2>  *12 } ' ' • (u„,  q2} 

0 • • • (u„,  q„} 


Q/?-Decom  position 

If  A is  an  m x n matrix  with  linearly  independent  column  vectors,  then  A can  be  factored  as 

A = QR 

where  Q is  an  ^ x n matrix  with  orthonormal  column  vectors,  and  R is  an  ^ x n invertible  upper  triangular 
matrix. 


It  is  common  in  numerical  linear  algebra  to  say 
that  a matrix  with  linearly  independent  columns 
has  full  column  rank. 


Recall  from  Theorem  5.1.6  (the  Equivalence  Theorem)  that  a square  matrix  has  linearly  independent  column 
vectors  if  and  only  if  it  is  invertible.  Thus,  it  follows  from  the  foregoing  theorem  that  every  invertible  matrix  has  a 
QR-decomposition . 


EXAMPLE  9 QR-Decomposition  of  a 3 x 3 Matrix 


Find  the  (^-decomposition  of 


0 0 
1 0 
1 1 


The  column  vectors  of  A are 


V 

'o' 

'O' 

1 

, u2  = 

1 

, u3  = 

0 

1 

1 

1 

Applying  the  Gram-Schmidt  process  with  normalization  to  these  column  vectors  yields  the 


orthonormal  vectors  (see  Example  7) 


1 

2 

1 

1 

0 

1 

-f2 

1 

. 92  = 

fe 

. 93  = 

1 

1 

fe  , 

f2 

Thus,  it  follows  from  Formula  15  that  R is 


R = 


(ui,qi)  (u2,  qi} 

0 (u2>  92} 

0 0 


(«3.  qi} 

(™3.  *12} 
(U3.  93} 


3 2 1 

ft  ft 

0 -2.  J_ 

0 0 -4= 

ft 


Show  that  the  matrix  Q in  Example  9 has 
the  property  QQ  = /,  and  show  that  every 

mxn  matrix  with  orthonormal  column 
vectors  has  this  property. 


from  which  it  follows  that  the  ^-decomposition  of  A is 


1 0 0 
1 1 0 
1 1 1 


{i  ft> 

_l_  _l_ l_ 

(2  {e  {2 

1 1 1 

{2  {l  {2 


3 2 1 

{2  {2  {2 

0 -2_  J_ 

fe  fe 

0 0 -4= 

{2 


A 


Q 


R 


Concept  Review 

Orthogonal  and  orthonormal  sets 
Normalizing  a vector 
Orthogonal  projections 
Gram-Schmidt  process 
(^-decomposition 


Skills 


Determine  whether  a set  of  vectors  is  orthogonal  (or  orthonormal). 

Compute  the  coordinates  of  a vector  with  respect  to  an  orthogonal  (or  orthonormal)  basis. 

Find  the  orthogonal  projection  of  a vector  onto  a subspace. 

Use  the  Gram-Schmidt  process  to  construct  an  orthogonal  (or  orthonormal)  basis  for  an  inner  product 
space. 

Find  the  ^-decomposition  of  an  invertible  matrix. 


Exercise  Set  6.3 


1.  Which  of  the  following  sets  of  vectors  are  orthogonal  with  respect  to  the  Euclidean  inner  product  on  p/'? 
(a)  (0.  1),  (2.  0) 


(b) 

(c) 


1 1 


1 1 


\ 


_j i_\  lj i_ 

l /2'  fil’W  ft 
(d)  (0,0),  (0,1) 


Answer: 

(a),  (b),  (d) 

2.  Which  of  the  sets  in  Exercise  1 are  orthonormal  with  respect  to  the  Euclidean  inner  product  on 

3.  Which  of  the  following  sets  of  vectors  are  orthogonal  with  respect  to  the  Euclidean  inner  product  on  R-,c! 


(b)  (2  _2  n/2  1 _2Wi  2 2 'l 

^3’  3’3/i3’3’  3/^3’  3’  3 J 


(c) 


(d) 


0.0.  0), 


1 1 


o 7?’  7?  J’ 


1 L.o' 


{s’  {s’  {lj  \f2  {2 


Answer: 


(b),  (d) 


4.  Which  of  the  sets  in  Exercise  3 are  orthonormal  with  respect  to  the  Euclidean  inner  product  on  £3? 

5.  Which  of  the  following  sets  of  polynomials  are  orthonormal  with  respect  to  the  inner  product  on  P2  discussed  in 
Example  7 of  Section  6.1  ? 

(a)  pi  Or)  = | - jx  + jx2,  P2(x)  = | + yx  - jx2,  p3(x)  = j 4-  jx  + jx2 


(b)  Pl(x)  = 1,  p2(x)  =-W  + -j=x2,  p3(x)  =X2 

y 2 y 2 


Answer: 


(a) 


6.  Which  of  the  following  sets  of  matrices  are  orthonormal  with  respect  to  the  inner  product  on  M 22  discussed  in 
Example  6 of  Section  6.1  ? 


1 0 
0 0 


(b) 


‘ I 


0 !l 

0 7 

3 

3 

2 1 

* 

2 2 

3 3 

3 3 

fl  01 

"0  f 

1 

O 

O 

O 

O 

O 

O 

l 

7 

O 

O 

l 

7 

L>  1 

7 

L>  -'J 

7.  Verify  that  the  given  vectors  form  an  orthogonal  set  with  respect  to  the  Euclidean  inner  product;  then  convert  it 
to  an  orthonormal  set  by  normalizing  the  vectors. 

(a)  (-1,2),  (6,  3) 

(b)  (1,  0,  - 1),  (2,  0,  2),  (0,  5,  0) 

(c)  (1  I f_i  1 oWi  i -2) 

p 5’  5 Jp  2’ 2’  }’\3’3’  3) 

Answer: 

(a)  (_  J_  _2_\  f_2_  J_\ 

r & fir  (fi'  fit 


(b) 


(c) 


4=.  0,  -4=i, 


fi'  ’ fir\fi"fi 


-7=.  0,4=1  (0.1.0) 


J 1 1_ 

fi’  fi’  fi 


o' 


1 1 


fi  f2  7 [fi-  fe  ft>) 


8.  Verify  that  the  set  of  vectors  { ( 1 , 0) , (0,  1 ) } is  orthogonal  with  respect  to  the  inner  product 
(u,  v J = Au  iv  1 I &2v2  on  then  convert  it  to  an  orthonormal  set  by  normalizing  the  vectors. 

9.  Verify  that  the  vectors 

VI  = ( - f . f.  0 j,  v2  = 0 j,  v3  = (0,  0,  1) 

form  an  orthonormal  basis  for  p-'  with  the  Euclidean  inner  product;  then  use  Theorem  63.2b  to  express  each  of 
the  following  as  linear  combinations  of  vj,  V2,  and  V3. 

(a)  (1,  -1,2) 

(b)  (3,  -7,4) 

(c)  fi  _ 1 5 ) 

\T  I’l) 


Answer: 


(a)  — ^-vi  + ^V2  + 2V3 

(b)  ^.Vj  _ ^y2  4-  4v3 

(c)  _^Vl  _ iV2  + ^V3 


10.  Verify  that  the  vectors 


vi  = (1,  -1.2.  -1), 

v3  = (1.2,0,  -1), 


v2  = ( — 2.  2.  3,  2), 
v4  = (1,  0,  0,1) 


form  an  orthogonal  basis  for  with  the  Euclidean  inner  product;  then  use  Theorem  63.2a  to  express  each  of 
the  following  as  linear  combinations  of  vj,  V2,  V3,  and  V4. 

(a)  (1,  1,  1,  1) 

(b)  [i[2,  -3/2,  5/2,  -{2) 


(c) 


/_!  2 _1  4\ 
^ 3’ 3’  3’ 3 ) 


(a)  Show  that  the  vectors 

vj  = (1,  -2,3,  -4),  v2  = (2,  1,-4,  -3), 

v3  = (-3,4,  1,-2),  v4  = (4,3,2,  1) 

form  an  orthogonal  basis  for  with  the  Euclidean  inner  product. 

(b)  Use  Theorem  63.2a  to  express  u = ( — 1,  2,  3,  7)  as  a linear  combination  of  the  vectors  in  part  (a). 


Answer: 


(b)  u = - - ]iv2  + 0v3  4-  ^v4 

In  Exercises  12-13,  an  orthonormal  basis  with  respect  to  the  Euclidean  inner  product  is  given.  Use  Theorem  63.2b 
to  find  the  coordinate  vector  of  w with  respect  to  that  basis. 


12. 


<“)„=(3,7);„1  = (J=,--Lj,u2  = 


J 1 

1/2'  /2 


(b)  «r=(-1,0,2);«i  = (|.  7)  »2=(f.  7,  -f),«3=(|.  f.f) 

13'  (a)  w = (2,  0,  5);  u,  = (§,  1 §),  „2  = (I  f , - 1), u3  = (§,  - § , - 1) 


1 


1 


(b)w=(-l,,,2),u1  = (-^,7=,v=|,„2  = 


__  J 2_  J_ 

l ft’  ft’  ft}’ 


i 


u3  = 


1 


\ 


fte’  fte’  fte 


Answer: 


W=  -y-ui  - |u2  - JU3 


(a) 


In  Exercises  14-15,  the  given  vectors  are  orthogonal  with  respect  to  the  Euclidean  inner  product.  Find  proj^rx, 
where  x = ( 1 , 2,  0,  — 2)  and  W is  the  subspace  of  spanned  by  the  vectors. 


14-(a)  vi  = (1.1,  l,l),v2  = (l,l,  -1,-1) 

(b)  vj  = (0,  1.  -A,  — 1),  V2  = (3,  5,  1.1) 

15.  (a)  Vj  = (];  l;  1),  V2  = (1,  1,  - 1,  - 1),  V3  = (1,  -1,1,  - 1) 

(b)  vj  = (0,  1,  -A,  - 1),V2=  (3,  5,  1,  1),V3  = (1,  0,  1,  -4) 


Answer: 


(b)  m i _j_  _m 

\\2’ A’  12’  12  J 

In  Exercises  16-17,  the  given  vectors  are  orthonormal  with  respect  to  the  Euclidean  inner  product.  Use  Theorem 
6.3  Ab  to  find  proj^r  x,  where  x=(l,2,0,  — 1)  and  W is  the  subspace  of  spanned  by  the  vectors. 


16. 


(a) 


vj  = 


o _J 4 i_i  r_n  5 i n 

/ /is’  /is’  /l8  }’  2 \2’  6’  6’  6) 


(b) 


17. 


(a) 


vi  = fl  1 1 11  v2  = f—  - --  -II 

1 \2’  2’  2’  2 )’  2 \2’  2’  2’  2) 

V‘  = (°'  TlS  ’ _ '/s  ’ ' /18  )’ V2  = (2- 1’  6 ’ i)’ T3  = (7k'  0>  7k 


4 

f\ 8 


(b) 


vi 


1 1 


1 1 


2’ 2’  2’  2 i 


).y3=( 


1 _i  I 

2’  2’  2’ 


\ 

h 


Answer: 

(a)  (23  EL  __L 

\\S’  6 ’ 18’  18 ) 

(b)  3 3 _I  _ n 

\2’  2’  2’  2) 


18.  In  Example  6 of  Section  4.9  we  found  the  orthogonal  projection  of  the  vector  x = (1,  5)  onto  the  line  through 
the  origin  making  an  angle  of  77  / 6 radians  with  the  x-axis.  Solve  that  same  problem  using  Theorem  6.3.4. 

19.  Find  the  vectors  in  W and  W2  in  W 1 such  that  x = 4-  W2,  where  x and  W are  as  given  in 

(a)  Exercise  14(a). 

(b)  Exercise  15(a). 


Answer: 


<a)wi  = (§,§,  -1,  -lj,  w2=(-I  i.l,  -l) 

(b) 


(1 

5 

3 

5^  / 

3 

3 

3 

3^ 

-\4’ 

4’ 

4’ 

4’ 

4’ 

4’ 

4j 

20.  Find  the  vectors  in  W and  w3  in  W 1 such  that  x = wi  + w3,  where  x and  W are  as  given  in 

(a)  Exercise  16(a). 

(b)  Exercise  17(a). 

21.  Let  R1  have  the  Euclidean  inner  product.  Use  the  Gram-Schmidt  process  to  transform  the  basis  {ui,  112}  into 
an  orthonormal  basis.  Draw  both  sets  of  basis  vectors  in  the  xj-plane. 

(a)  *1  = (1,  -3),  u2  = (2,  2) 

(b)  ui  = (1,  0),  u2  = (3,  -5) 


Answer: 


(a) 


vi  = 


/To’  /To 


. v2  = 


/To’  /To/ 


(b)  Vi  = (1,  0),  v2  = (0,  - 1) 


T 

22.  Let  R[ have  the  Euclidean  inner  product.  Use  theGram-Schmidt  process  to  transform  the  basis  (uj,  112, 113} 
into  an  orthonormal  basis. 

(a)  ui  = (1,  1,  1),  u2  = ( - 1,  1,  0),  u3  = (1,  2,  1) 

(b)  U1  = (1,  0,  0),  u2  = (3, 7,  — 2),  u3  = (0,  4,  1) 

23.  Let  R^  have  the  Euclidean  inner  product.  Use  the  Gram-Schmidt  process  to  transform  the  basis 
(ui,  U2,  u3, 114}  into  an  orthonormal  basis. 


Answer: 


«!  = (0,2,  1,0),  u2  = (l,  -1,0,0), 
u3  = (1,  2,  0,  — 1),  U4=(1.0,0,l) 


V1 T k k°\ V2~(k’  h’ hi 

_ _/_l 1 2 2_\  _ =f_J 1 2 3_T 

3 ^/To’ /To’  /To’  /To/  4 l^/i?’ /is’  /Ts’/isJ 

24.  Let  R-'  have  the  Euclidean  inner  product.  Find  an  orthonormal  basis  for  the  subspace  spanned  by  (0,  1,2), 
(“1,0,  1),  (“1,1,  3). 

25.  Let  R1'  have  the  inner  product 

(u,  vj  =u\v\  + 2^2v2  + 3&3V3 

Use  the  Gram-Schmidt  process  to  transform  u\  = (1,  1,  1),  U2  = (1,  1,  0),  113  = (1,  0,  0)  into  an  orthonormal 


Answer: 


vi~(/r k V2"(/^’ k h}V3~[k 


^ Let  R3  have  the  Euclidean  inner  product.  The  subspace  of  r}  spanned  by  the  vectors  ui  = 0,  — | and 

U2  = (0,  1,  0)  is  aplane  passing  through  the  origin.  Express  w=  (1,  2,  3)  in  the  form  w = wj  4 W2,  where  wj 
lies  in  the  plane  and  W2  is  perpendicular  to  the  plane. 

27.  Repeat  Exercise  26  with  uj  = (1,  1,  1)  and  U2  = (2,  0,  — 1). 


Answer: 

Wl  = fil  li  40\  M _J_  _2_\ 

1 \ 14  ’ 14  ’ 14  )’  2 04’  14  ’ 14  J 

28.  Let  R^  have  the  Euclidean  inner  product.  Express  the  vector  w = ( — 1 , 2,  6,  0)  in  the  form  w = 4-  W2, 

where  is  in  the  space  W spanned  by  m = ( — 1,  0,  1,  2)  and  112  = (0,  1,0,  1),  and  is  orthogonal  to  W. 

29.  Find  the  (^-decomposition  of  the  matrix,  where  possible. 

(a)  |"1  -1] 


(b)  1 2 

0 1 


(c)  1 1 

-2  1 

2 1 

(d)  r 1 2 

0 1 1 

1 2 0 


(e) 


1 2 1 
1 1 1 
0 3 1 


(f) 


1 0 1 

-1  1 1 

1 0 1 

-1  1 1 


Answer: 


(a) 


J_ 2_ 

f 'f 


_2_  J_ 

f f5 


f f 

0 {5 


(b) 


0 


f 


f 


f 


f 3/2 
0 (3 


(c) 


1 

3 


2 

3 


2 

3 


8 


l/234 

11 

l/234 

7 

l/234 


1 

3 

l/26 

3 


(d)  I _L  _ J_  J_ 

{2  f f 

2 

fe 


0 ~T 
/3 

{2  1/3 


1 


/? 


{2  {2  {2 

0 

0 0^ 

f 


(e) 


0 


2/T9 

2/T9 

3 1/2 
1/I9 


0 

0 


(f)  Columns  not  linearly  independent 


30.  In  Step  3 of  the  proof  of  Theorem  6.3.5,  it  was  stated  that  “the  linear  independence  of  {uj,  U2, uM)  ensures 
that  V3  * 0.”  Prove  this  statement. 

31.  Prove  that  the  diagonal  entries  of  R in  Formula  15  are  nonzero. 

32.  Calculus  required  Use  Theorem  6.3.2  a to  express  the  following  polynomials  as  linear  combinations  of  the  first 
three  Legendre  polynomials  (see  the  Remark  following  Example  8). 

(a)  1 -|-  x + Ax2 

(b)  2 -lx2 

(c)  4 + 3x 


33.  Calculus  required  Let  Pj  have  the  inner  product 


|p,  q 


-/ 

Jo 


P(x)q (x)  dx 


Apply  the  Gram-Schmidt  process  to  transform  the  standard  basis  S=  <j  1,  x,  x~  l into  an  orthonormal  basis. 


Answer: 

V1  = 1,  V2  = /3(2x  - 1),  V3  = /5(6x2  - 6x  + 1) 

34.  Find  vectors  x and  y in  R 1 that  are  orthonormal  with  respect  to  the  inner  product  (u,  v } = 3u  \ v \ } 2^2V2  but 
are  not  orthonormal  with  respect  to  the  Euclidean  inner  product. 

True-False  Exercises 


In  parts  (a)-(f)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  Every  linearly  independent  set  of  vectors  in  an  inner  product  space  is  orthogonal. 

Answer: 

False 

(b)  Every  orthogonal  set  of  vectors  in  an  inner  product  space  is  linearly  independent. 

Answer: 

False 

(c)  Every  nontrivial  subspace  of  R has  an  orthonormal  basis  with  respect  to  the  Euclidean  inner  product. 
Answer: 

True 

(d)  Every  nonzero  finite-dimensional  inner  product  space  has  an  orthonormal  basis. 

Answer: 

True 

(e)  projw  x is  orthogonal  to  every  vector  of  W. 


Answer: 


False 

(f)  If  A is  an  n x n matrix  with  a nonzero  determinant,  then  A has  a ^-decomposition. 
Answer: 

True 
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6.4  Best  Approximation;  Least  Squares 

In  this  section  we  will  be  concerned  with  linear  systems  that  cannot  be  solved  exactly  and  for  which  an  approximate  solution  is 
needed.  Such  systems  commonly  occur  in  applications  where  measurement  errors  “perturb”  the  coefficients  of  a consistent  system 
sufficiently  to  produce  inconsistency. 


Least  Squares  Solutions  of  Linear  Systems 

Suppose  that  = b is  an  inconsistent  linear  system  of  m equations  in  n unknowns  in  which  we  suspect  the  inconsistency  to  be 
caused  by  measurement  errors  in  the  coefficients  of  A.  Since  no  exact  solution  is  possible,  we  will  look  for  a vector  x that  comes  as 
“close  as  possible”  to  being  a solution  in  the  sense  that  it  minimizes  ||b  — Ax\\  with  respect  to  the  Euclidean  inner  product  on  Rm. 
You  can  think  of  Ax  as  an  aPproximation  to  b and  ||b  — ^x||  as  the  error  in  that  approximation — the  smaller  the  error,  the  better 
the  approximation.  This  leads  to  the  following  problem. 


Least  Squares  Problem 

Given  a linear  system  Ax  = b of  m equations  in  n unknowns,  find  a vector  x that  minimizes  ||b  — Ax\\  with  respect  to  the 
Euclidean  inner  product  on  R™.  We  call  such  an  x a least  squares  solution  of  the  system,  we  call  b — Ax  the  least  squares 
error  vector , and  we  call  ||b  — Ax\\  the  least  squares  error. 

J 


To  clarify  the  above  terminology,  suppose  that  the  matrix  form  of  b — Ax  is 

"*1 

b — Ax  = 


e 2 


2 2 2 

The  term  “least  squares  solution”  results  from  the  fact  that  minimizing  ||b  — -<4x||  also  minimizes  ||b  — Ax ||  = el  +e2 


Best  Approximation 


Suppose  that  b is  a fixed  vector  in  ^ J that  we  would  like  to  approximate  by  a vector  w that  is  required  to  lie  in  some  subspace  W 
of  p}.  Unless  b happens  to  be  in  IV,  then  any  such  approximation  will  result  in  an  “error  vector”  b — w that  cannot  be  made  equal 
to  0 no  matter  how  w is  chosen  (Figure  6.4.1a).  However,  by  choosing 

w=projjp  b 


we  can  make  the  length  of  the  error  vector 


||b-w||  = ||b-proj^  b|| 


as  small  as  possible  (Figure  6.4.  lb). 


b 


b-w 


r 


\v 

(<*) 


projR,b 

(*) 


P 

b-projlvb 


W 


Figure  6.4.1 


These  geometric  ideas  suggest  the  following  general  theorem. 


Best  Approximation  Theorem 

If  W is  a finite-dimensional  subspace  of  an  inner  product  space  V,  and  if  b is  a vector  in  V,  then  proj^r  b is  the  best 
approximation  to  b from  W in  the  sense  that 

Ilk  — projjp  b||  < ||b  — w|| 

for  every  vector  w in  W that  is  different  from  proj^r  b. 


For  every  vector  w in  W,  we  can  write 

b — w=  (b  -proj^  b)  4-  (proj^  b -w)  (1) 

But  projjp  b — w being  a difference  of  vectors  in  W is  itself  in  W;  and  since  b — proj^  b is  orthogonal  to  W,  the  two  terms  on  the 
right  side  of  1 are  orthogonal.  Thus,  it  follows  from  the  Theorem  of  Pythagoras  (Theorem  6.2.3)  that 

lib—  w||2  = ||b—  projfp  b || 2 + ||proj^  b -w||2 
Since  w*  proj^  b,  it  follows  that  the  second  term  in  this  sum  is  positive,  and  hence  that 

l|b  — projfp  b || 2 < ||b  — w||2 

Since  norms  are  nonnegative,  it  follows  (from  a property  of  inequalities)  that 

||b  — projjp  b||  < ||b  — w|| 


Least  Squares  Solutions  of  Linear  Systems 

One  way  to  find  a least  squares  solution  of  ^£x:  = b is  to  calculate  the  orthogonal  projection  proj^r  b on  the  column  space  W of  the 
matrix  A and  then  solve  the  equation 


Ax.  = pr°j  w b (2) 

However,  we  can  avoid  the  need  to  calculate  the  projection  by  rewriting  2 as 

b — Ax.  = b — projjTfr  b 

and  then  multiplying  both  sides  of  this  equation  by  A ^ to  obtain 

^r(b-^x)=^r(b-proJfr  b)  (3) 

Since  b — proj^  b is  the  component  of  b that  is  orthogonal  to  the  column  space  of  A , it  follows  from  Theorem  4.8.9  b that  this 
vector  lies  in  the  null  space  of  A and  hence  that 

AT( b— projfjr  b)  =0 

Thus,  3 simplifies  to 

^r(b-^x)  = 0 

which  we  can  rewrite  as 


AtAx  = At b 


(4) 


This  is  called  the  normal  equation  or  the  normal  system  associated  with  = b-  When  viewed  as  a linear  system,  the  individual 
equations  are  called  the  normal  equations  associated  with  Ax  = b- 


In  summary,  we  have  established  the  following  result. 

THEOREM  6.4.2 

For  every  linear  system  Ax  = b>  the  associated  normal  system 

ATAx  = AT b (5) 

is  consistent,  and  all  solutions  of  5 are  least  squaressolutions  of  Ax  = b-  Moreover,  if  W is  the  column  space  of  A,  and  x is 
any  least  squares  solution  of  ^ = b>  then  the  orthogonal  projection  of  b on  W is 

projjp  b = Ax  (6) 


If  a linear  system  is  consistent,  then  its  exact  solutions  are 
the  same  as  its  least  squares  solutions,  in  which  case  the 
error  is  zero. 

EXAMPLE  1 Least  Squares  Solution 


Find  all  least  squares  solutions  of  the  linear  system 


*1 

- 

*2 

= 4 

3*1 

+ 

2X2 

= 1 

—2xi 

+ 

4X2 

= 3 

Find  the  error  vector  and  the  error. 


Solution 


It  will  be  convenient  to  express  the  system  in  the  matrix  form  Ax  = b>  where 


It  follows  that 


1 

-1" 

'4" 

A = 

3 

2 

and  b = 

1 

-2 

4 

3 

1 

-l' 

1 

3 -2] 

r 14  —3 

-1  : 

2 4 

D 

z 

-3  21 

J 

-2 

4 

L J 

'4" 

A 

7b  = 

1 3 

—2 

1 

= [ 4 

-1  2 

4_ 

10 

3 

L J 

so  the  normal  system  A^ Ax  = A ^b 


" 14 

-3 

"*l" 

r 

-3 

21_ 

_*2_ 

_io_ 

Solving  this  system  yields  a unique  least  squares  solution,  namely, 


*1 


17 

95’ 


*2  = 


143 

285 


The  error  vector  is 


and  the  error  is 


92  " 

1232  " 

'4" 

1 

«r 

17 

4" 

285 

285 

Ax  = 

1 

3 

2 

95 



1 

439 



154 

143 

285 

285 

3 

=2 

4 

285 

3 

95 

4 

57 

3 

||b-;4x||»4.556 


EXAMPLE  2 Orthogonal  Projection  on  a Subspace 

Find  the  orthogonal  projection  of  the  vector  u = ( — 3,  — 3,  8,  9)  on  the  subspace  of  spanned  by  the  vectors 
ul  = (3,  1,0,  1),  u2  = (1,  2,  1,  1),  u3  = (-1,0,  2,  -1) 


We  could  solve  this  problem  by  first  using  the  Gram-Schmidt  process  to  convert  {uj,  u2, 113}  into  an 
orthonormal  basis  and  then  applying  the  method  used  in  Example  6 of  Section  6.3  . However,  the  following  method 
is  more  efficient. 


The  subspace  W of  spanned  by  uj,  u2,  and  113  is  the  column  space  of  the  matrix 


A = 


1 -1 

2 0 

1 2 

1 -1 


Thus,  if  u is  expressed  as  a column  vector,  we  can  find  the  orthogonal  projection  of  u on  W by  finding  a least 
squares  solution  of  the  system  Hx  = u and  then  calculating  projpp  u = Ax  from  the  least  squares  solution.  The 
computations  are  as  follows:  The  system  Ax  = u is 

-1 

0 
2 


-1 


*1 

*2 

*3 


-3 

-3 

8 

9 


so 


A7 A 


A7 u 


3 10  1 

12  1 1 

-1  02-1 


3 1 -1 
1 2 0 
0 1 2 
1 1 -1 


11  6 -4 

6 7 0 

-4  0 6 


— 3" 

3 

1 

0 

f 

—3 

’— 3" 

= 

1 

2 

1 

1 

8 

9 

= 

8 

-1 

0 

2 

-1 

10 

The  normal  system  ^ Ax  = A in  this  case  is 


'll  6 

-4' 

"*r 

-3' 

6 7 

0 

*2 

= 

8 

1 

1 

O 

6 

*3 

10 

Solving  this  system  yields 


"*l“ 

'-r 

X = 

x2 

= 

2 

*3 

i 

as  the  least  squares  solution  of  Ax  = u (verify),  so 


projfp  u = Ax  = 


'3 

1 

-l' 

'-r 

"-2" 

1 

2 

0 

3 

0 

1 

2 

6 

1 

4 

1 

1 

-1 

0 

or,  in  comma-delimited  notation,  projpp  u = ( — 2,  3,  4,  0). 


Uniqueness  of  Least  Squares  Solutions 

In  general,  least  squares  solutions  of  linear  systems  are  not  unique.  Although  the  linear  system  in  Example  1 turned  out  to  have  a 
unique  least  squares  solution,  that  occurred  only  because  the  coefficient  matrix  of  the  system  happened  to  satisfy  certain  conditions 
that  guarantee  uniqueness.  Our  next  theorem  will  show  what  those  conditions  are. 


THEOREM  6.4.3 

If  A is  an  m x n matrix,  then  the  following  are  equivalent. 

(a)  A has  linearly  independent  column  vectors. 

(b)  A ^A  is  invertible. 

We  will  prove  that  (<s)  ( b ) and  leave  the  proof  that  (6)  =>  ((3)  as  an  exercise. 

(a)  =*  (b)  Assume  that  A has  linearly  independent  column  vectors.  The  matrix  A 1 A has  size  ^ x so  we  can  prove  that  this 
matrix  is  invertible  by  showing  that  the  linear  system  A ^Ax  = 0 has  only  the  trivial  solution.  But  if  x is  any  solution  of  this 
system,  then  Ax  is  in  the  null  space  of  A^  and  also  in  the  column  space  of^4.  By  Theorem  4.8.9 b these  spaces  are  orthogonal 
complements,  so  part  (Z?)  of  Theorem  6.2.4  implies  that  = 0-  But  A is  assumed  to  have  linearly  independent  column  vectors,  so 
x = 0 hy  Theorem  1.3.1. 

As  an  exercise,  try  using  Formula  7 to  solve  the  problem 
in  part  (a)  of  Example  1 . 


The  next  theorem,  which  follows  directly  from  Theorem  6.4.2  and  Theorem  6.4.3,  gives  an  explicit  formula  for  the  least  squares 
solution  of  a linear  system  in  which  the  coefficient  matrix  has  linearly  independent  column  vectors. 


THEOREM  6.4.4 

If  A is  an  ^ x n matrix  with  linearly  independent  column  vectors,  then  for  every  mx  \ matrix  b,  the  linearsystem  Ax  = b 
has  a unique  least  squares  solution.  This  solution  is  given  by 

x=(ATA)~lATb 


(7) 


Moreover,  if  W is  the  column  space  of  A,  then  the  orthogonalprojection  of  b on  W is 


projfp  b = 


Ax 


= A(ATA'j  \ 


A1  b 


(8) 


OPTIONAL 

The  Role  of  QR-Decomposition  in  Least  Squares  Problems 

Formulas  7 and  8 have  theoretical  use  but  are  not  well  suited  for  numerical  computation.  In  practice,  least  squares  solutions  of 
Ax  = b are  typically  found  by  using  some  variation  of  Gaussian  elimination  to  solve  the  normal  equations  or  by  using 
^-decomposition  and  the  following  theorem. 


THEOREM  6.4.5 

If  A is  an  ^ x n matrix  with  linearly  independent  column  vectors,  and  if  A = QR  is  a ^-decomposition  of  A (see  Theorem 
6.3.7),  then  for  each  b in  R 171  the  system  Ax  = b has  a unique  least  squares  solution  given  by 

x = JR-1erb  (9) 


A proof  of  this  theorem  and  a discussion  of  its  use  can  be  found  in  many  books  on  numerical  methods  of  linear  algebra.  However, 

T 

you  can  obtain  Formula  9 by  making  the  substitution  A = QR  in  7 and  using  the  fact  that  Q Q = / to  obtain 

* = (ce*)r(e*))-1(e*)rb 
= [RTQTQRy\QR)Th 
= R-llRTy1RTQT b 
= *_1erb 

Orthogonal  Projections  on  Subspaces  of  Rm 

In  Section  4.8  we  showed  how  to  compute  orthogonal  projections  on  the  coordinate  axes  of  a rectangular  coordinate  system  in  R 3 
and  more  generally  on  lines  through  the  origin  of  R-'.  We  will  now  consider  the  problem  of  finding  orthogonal  projections  on 
subspaces  of  Rm.  We  begin  with  the  following  definition. 


DEFINITION  1 

If  IF  is  a subspace  of  R™,  then  the  linear  transformation  P:Rm  — ► W that  maps  each  vector  x in  R™  into  its  orthogonal 
projection  proj^r  x in  IF  is  called  the  orthogonal  projection  of  Rm  on  W 

J 


It  follows  from  Formula  7 that  the  standard  matrix  for  the  transformation  P is 


(10) 


[P]  =a(atA'j  1At 

where  A is  constructed  using  any  basis  for  W as  its  column  vectors. 

EXAMPLE  3 The  Standard  Matrix  for  an  Orthogonal  Projection  on  a Line 

We  showed  in  Formula  16  of  Section  4.9  that 


Po  = 


2 

cos  9 sin  9 cos  9 
sin  9 cos  9 sin2  9 

is  the  standard  matrix  for  the  orthogonal  projection  on  the  line  W through  the  origin  of  R^  that  makes  an  angle  0 with 
the  positive  x-axis.  Derive  this  result  using  Formula  10. 

The  column  vectors  of  A can  be  formed  from  any  basis  for  W.  Since  W is  one-dimensional,  we  can  take 
w=  (cos  9,  sin  9)  as  the  basis  vector  (Figure  6.4.2),  so 

cos  9 


A = 


sin  9 


We  leave  it  for  you  to  show  that  A ' A is  the  1 x 1 identity  matrix.  Thus,  Formula  10  simplifies  to 

[cos#  sin  0] 


f ata)  1at=aat= 

COS  9 

V / 

sin  9 

cos^0  sin  9 cos  9 
sin  9 cos  9 sin2  9 


= Pb 


Another  View  of  Least  Squares 

Recall  from  Theorem  4.8.9  that  the  null  space  and  row  space  of  an  m x n matrix^  are  orthogonal  complements,  as  are  the  null 
space  of^  and  the  column  space  of  A.  Thus,  given  a linear  system  Ax  = b in  which  A is  an  m x n matrix,  the  Projection 
Theorem  (6.3.3)  tells  us  that  the  vectors  x and  b can  each  be  decomposed  into  sums  of  orthogonal  terms  as 

x = xr0w(4)  + xnull(4)  b = bnull^r)  + bcol(4i 

where  xrow(4i  and  xnull(4i  are  the  orthogonal  projections  of  x on  the  row  space  of  A and  the  null  space  of  A , and  the  vectors 
and  b coi (j§  are  the  orthogonal  projections  of  b on  the  null  space  of  A J and  the  column  space  of  A. 

In  Figure  6.4.3  we  have  represented  the  fundamental  spaces  of  A by  perpendicular  lines  in  Rn  and  R™  on  which  we  indicated  the 
orthogonal  projections  of  x and  b.  (This,  of  course,  is  only  pictorial  since  the  fundamental  spaces  need  not  be  one-dimensional.) 
The  figure  shows  Ax  as  a point  in  the  column  space  of  A and  conveys  that  bco^  is  the  point  in  col(^4)  that  is  closest  to  b.  This 


illustrates  that  the  least  squares  solutions  of  ^ = b are  the  exact  solutions  of  the  equation  Ax  = bco^. 

nuIlM)  col(.4) 


NiullM  I 


Rn 


-row(/4)  null(A0- 


Ax 


R ■ 


Figure  6.4.3 


More  on  the  Equivalence  Theorem 

As  our  final  result  in  the  main  part  of  this  section  we  will  add  one  additional  part  to  Theorem  5.1.6. 


Equivalent  Statements 

If  A is  an  n x n matrix,  then  the  following  statements  are  equivalent. 

(a)  A is  invertible. 

(b)  Ax  = 0 has  only  the  trivial  solution. 

(c)  The  reduced  row  echelon  form  of  A is  /„. 

(d)  A is  expressible  as  a product  of  elementary  matrices. 

(e)  Ax.  = b is  consistent  for  every  n x 1 matrix  b. 

(f)  Ax  = b has  exactly  one  solution  for  every  n x 1 matrix  b. 

(g)  det(^)  * 0. 

(h)  The  column  vectors  of  A are  linearly  independent. 

(i)  The  row  vectors  of  A are  linearly  independent. 

(j)  The  column  vectors  of  A span  Rn. 

(k)  The  row  vectors  of  A span  Rn. 

(l)  The  column  vectors  of  A form  a basis  for  Rn. 

(m)  The  row  vectors  of  A form  a basis  for  Rn. 

(n)  ^hasranktf- 

(o)  A has  nullity  0. 

(p)  The  orthogonal  complement  of  the  null  space  of  A is  Rn. 

(q)  The  orthogonal  complement  of  the  row  space  of  A is  {0}  . 

(r)  The  range  of  T is  R n . 

(s)  T j\  is  one-to-one. 

(t)  \ = 0 is  not  an  eigenvalue  of  A. 

(u)  A ^ A is  invertible. 


The  proof  of  part  ( u ) follows  from  part  ( h ) of  this  theorem  and  Theorem  6.4.3  applied  to  square  matrices. 


OPTIONAL 


We  now  have  all  the  ingredients  needed  to  prove  Theorem  6.3.3  in  the  special  case  where  V is  the  vector  space  Rm. 

We  will  leave  the  case  where  W = { 0 } as  an  exercise,  so  assume  that  W * { 0 } . Let 
(VL  v2>  Vfc)  be  any  basis  for  W,  and  form  the  ^ x k matrix  M that  has  these  basis  vectors  as  successive  columns.  This  makes 
W the  column  space  of  M and  hence  W 1 the  null  space  of  M T We  will  complete  the  proof  by  showing  that  every  vector  u in  Rm 
can  be  written  in  exactly  one  way  as 


u = wj  4 W2 

where  w\  is  in  the  column  space  of  M and  = O’  However,  to  say  that  is  in  the  column  space  of  Mis  equivalent  to  saying 

w\  = Mx  for  some  vector  x in  Rm,  and  to  say  that  ][{  = 0 is  equivalent  to  saying  that  M ^(u  — wi ) =0-  Thus,  if  we  can 

show  that  the  equation 


il/r(u-ilfx)=0  (11) 

has  a unique  solution  for  x,  then  w\  = Mx  and  W2  = x — wj  will  be  uniquely  determined  vectors  with  the  required  properties.  To 
do  this,  let  us  rewrite  1 1 as 

MTMx  = MT  u 

Since  the  matrix  M has  linearly  independent  column  vectors,  the  matrix  ][{  is  invertible  by  Theorem  6.4.6  and  hence  the 
equation  has  a unique  solution  as  required  to  complete  the  proof. 


Concept  Review 

Least  squares  problem 
Least  squares  solution 
Least  squares  error  vector 
Least  squares  error 
Best  approximation 
Normal  equation 
Orthogonal  projection 

Skills 

Find  the  least  squares  solution  of  a linear  system. 

Find  the  error  and  error  vector  associated  with  a least  squares  solution  to  a linear  system. 
Use  the  techniques  developed  in  this  section  to  compute  orthogonal  projections. 

Find  the  standard  matrix  of  an  orthogonal  projection. 


Exercise  Set  6.4 

1.  Find  the  normal  system  associated  with  the  given  linear  system. 


(a) 

1 

-f 

r*n 

2 

2 

3 

* - 

-1 

4 

5 

L J 

5 

(b) 


2 

3 

-1 

1 


-1 


0 
1 2 
4 5 
2 4 


*1 

*2 

x3 


-1 

0 

1 

2 


Answer: 


(a) 

(b) 


21  25 
25  35 
15  — 1 5 

-1  22  30 

5 30  45 


T*r 

"20' 

|/2_ 

.2°. 

*1 

*2 

x3 


-1 

9 

13 


In  Exercises  2-4,  find  the  least  squares  solution  of  the  linear  equation  Ax  = b- 


3., 


(a) 

1 -f 

2' 

A = 

2 3 

; b = 

-1 

4 5 

5 

(b) 

'2  -2' 

2' 

A = 

1 1 

;b  = 

-1 

3 1 

1 

(a) 

1 f 

T 

A = 

-1  1 

, b = 

0 

-1  2 

-7 

(b) 


A = 


1 

0 

-l" 

'6' 

2 

1 

-2 

k — 

0 

1 

1 

0 

, D — 

9 

1 

1 

-1 

3 

Answer: 


(a)  X1=5,  *2  = -^ 

(b)  ^ l = 12,  x2  = -3,  x3  = 9 


4. 


(a) 


(b) 


A = 


A = 


'3 

2 

-1' 

2 

1 

-4 

3 

, b = 

-2 

1 

10 

-7 

1 

"2 

0 

-r 

o' 

1 

-2 

2 

b = 

6 

2 

-1 

0 

0 

0 

1 

-1 

6 

In  Exercises  5-6,  find  the  least  squares  error  vector  e = b — Ax  resulting  from  the  least  squares  solution  x and  verify  that  it  is 
orthogonal  to  the  column  space  of  A. 

(a)  A and  b are  as  in  Exercise  3(a). 

(b)  A and  b are  as  in  Exercise  3(b). 


Answer: 


(a) 


(b) 


e = 


e = 


3 

2 

9 

2 

-3 

3 

—3 

0 

3 


(a)  A and  b are  as  in  Exercise  4(a). 

(b)  A and  b are  as  in  Exercise  4(b). 

7.  Find  all  least  squares  solutions  of  Ax  = b andconfirm  that  all  of  the  solutions  have  the  same  error  vector.  Compute  the  least 


squares  error. 


(a) 

2 

r 

'3' 

A = 

4 

2 

; b = 

2 

-2 

1 

1 

(b) 

1 

3' 

T 

A = 

-2 

-6 

; b = 

0 

3 

9 

1 

(c) 

-1  3 2 

7" 

A = 

2 1 3 

;b  = 

0 

0 1 1 

-7 

Answer: 

(a)  Solution:  x=  -7  j;  least  squares  error: 

(k)  Solution:  x = 0 '|  =M(  — 3,  1)  (t  a real  number);  least  squares  error:  ^-^42 

(c)  Solution:  x = y,  j,  oj  =M(  — 1,  — 1,  1)  (t  a real  number);  least  squares  error:  ^-^294 

8.  Find  the  orthogonal  projection  of  u on  the  subspace  of  p?  spanned  by  the  vectors  v\  and  v2. 

(a)  11=  (2,  1,  3);  ▼!  = (1.1,  0).  v2=  (1.2,1) 

(b) u=(l.  -6.1);  ▼!  = (- 1.2.1).  v2=  (2,  2,4) 

9.  Find  the  orthogonal  projection  of  u on  the  subspace  of  spanned  by  the  vectors  vj?  v2?  and  V3. 

(a)  u=(6,3.9,6);vi  = (2,  1,  1,  l),v2=(l,0,  1,  l),v3=(-2,  -1,0,  -1) 

(b)  u = ( — 2,  0,  2,  4);  vi  = (1,1,3,  0),v2  = (-2,  - 1,  - 2,  1),  v3  = ( - 3,  -1,1,3) 


Answer: 


(a)  (7,  2,  9,  5) 

(b)  (12  _ 4 12 


10.  Find  the  orthogonal  projection  of  u = (5,  6,  7,  2)  on  the  solution  space  of  the  homogeneous  linear  system 

*1  + *2  + *3  =0 

2x2  + x3  +*4=  0 

In  each  part,  find  A J?  and  apply  Theorem  6.4.3  to  determine  whether^  has  linearly  independent  column 


vectors. 


(a) 


A = 


(b) 


A = 


-13  2 
2 1 3 
0 1 1 

2-1  3 

0 1 1 
-1  0 -2 
4-5  3 


Answer: 

(a)  det  (A  A)  = 0;  A does  not  have  linearly  independent  column  vectors. 

(b)  det  (A  A)  = 0;  A does  not  have  linearly  independent  column  vectors. 

12.  Use  Formula  10  and  the  method  of  Example  3 to  find  the  standard  matrix  for  the  orthogonal  projection  _» p}  onto 

(a)  the  x-axis. 

(b)  they-axis. 

[Note:  Compare  your  results  to  Table  3 of  Section  4.9.] 

13.  Use  Formula  10  and  the  method  of  Example  3 to  find  the  standard  matrix  for  the  orthogonal  projection  p onto 

(a)  the  xz-plane. 

(b)  theyz-plane. 

[Note:  Compare  your  results  to  Table  4 of  Section  4.9.] 

Answer: 


(a) 

1 

0 

0 

[P]  = 

0 

0 

0 

0 

0 

1 

(b) 

0 

0 

0 

[P]  = 

0 

1 

0 

0 

0 

1 

14.  Show  that  if  w=  ( a , b , c ) is  a nonzero  vector,  then  the  standard  matrix  for  the  orthogonal  projection  of  p?  on  the  line 
span{w}  is 

a 2 ah  ac 
ab  b 2 be 
ac  be  c 2 

15.  Let  W be  the  plane  with  equation  + z = 0- 

(a)  Find  a basis  for  W. 

(b)  Use  Formula  10  to  find  the  standard  matrix  for  the  orthogonal  projection  on  W. 

(c)  Use  the  matrix  obtained  in  part  (b)  to  find  the  orthogonal  projection  of  a point  Pq(*0>  yo,  zq)  on  W. 

(d)  Find  the  distance  between  the  point  — 2,  4)  and  the  plane  W,  and  check  your  result  using  Theorem  3.3.4. 


a2+b2+c2 


Answer: 

(a)  (1.0,  -5),  (0,1,3) 

(b)  i r 10 15  _5 

[P]  = 35  15  26  3 

-5  3 34 


(c)  / 2x0  I 3 yg—ZQ  15x0  I 26y0  I 3zg  -Sxp  | 3yg  I 34 zg  \ 

[ 7 35  35  J 

(d)  3/35 

7 

16.  Let  IF  be  the  line  with  parametric  equations 

x = 2t,  y = — z = 4^ 

(a)  Find  a basis  for  IF. 

(b)  Use  Formula  10  to  find  the  standard  matrix  for  the  orthogonal  projection  on  W. 

(c)  Use  the  matrix  obtained  in  part  (b)  to  find  the  orthogonalprojection  of  a point  Pg(*0>  jg,  zg)  on  W. 

(d)  Find  the  distance  between  the  point  Pg(2,  1,-3)  and  the  line  W. 

17.  In  R^,  consider  the  line  / given  by  the  equations 

x =£,  y=t,  z = t 

and  the  line  m given  by  the  equations 

x = s,  y = 2s  — 1 , z=l 

Let  P be  a point  on  /,  and  let  Q be  a point  on  m.  Find  the  values  of  t and  5 that  minimize  the  distance  between  the  lines  by 
minimizing  the  squared  distance  ||^  — Q\\  . 

Answer: 
s = t = 1 

18.  Prove:  If  ^4  has  linearly  independent  column  vectors,  and  if  Ax  = b is  consistent,  then  the  least  squares  solution  of  = b and 
the  exact  solution  of  Jix  = b are  the  same. 

19.  Prove:  If  ^4  has  linearly  independent  column  vectors,  and  if  b is  orthogonal  to  the  column  space  of  A,  then  the  least  squares 
solution  of  — b is  x = 0- 

20.  Let  P:Rm  — » W be  the  orthogonal  projection  of  Rm  onto  a subspace  W. 

(a)  Prove  that  [P] 2 = [P] . 

(b)  What  does  the  result  in  part  (a)  imply  about  the  composition  p 0 pi 

(c)  Show  that  [P]  is  symmetric. 

21.  Let  A be  an  m x n matrix  with  linearly  independent  row  vectors.  Find  a standard  matrix  for  the  orthogonal  projection  of/!” 
onto  the  row  space  of  A.  [Hint:  Start  with  Formula  10.] 

Answer: 

[P]  = A7  (AA7)-^  A 

22.  Prove  the  implication  (b)  =>  (a)  of  Theorem  6.4.3. 

True-False  Exercises 

In  parts  (a)-(h)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  If  A is  an  m x n matrix,  then  A^A  is  a square  matrix. 

Answer: 

True 

(b)  If^  is  invertible,  then  A is  invertible. 

Answer: 


False 


(c)  If  A is  invertible,  then  A ^ A is  invertible. 

Answer: 

True 

(d)  If,4x  = b is  a consistent  linear  system,  then  A ^ Ax  = A^h  is  also  consistent. 

Answer: 

True 

(e)  lfAx  = h is  an  inconsistent  linear  system,  then  A ^ Ax  = A is  also  inconsistent. 

Answer: 

False 

(f)  Every  linear  system  has  a least  squares  solution. 

Answer: 

True 

(g)  Every  linear  system  has  a unique  least  squares  solution. 

Answer: 

False 

(h)  If  A is  an  m x n matrix  with  linearly  independent  columns  and  b is  in  Rm,  then  Ax  = b has  a unique  least  squares  solution. 
Answer: 

True 
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6.5  Least  Squares  Fitting  to  Data 

In  this  section  we  will  use  results  about  orthogonal  projections  in  inner  product  spaces  to  obtain  a technique 
for  fitting  a line  or  other  polynomial  curve  to  a set  of  experimentally  determined  points  in  the  plane. 


Fitting  a Curve  to  Data 

A common  problem  in  experimental  work  is  to  obtain  a mathematical  relationship  y = / (x)  between  two 
variables  x and  v by  “fitting”  a curve  to  points  in  the  plane  corresponding  to  various  experimentally 
determined  values  of  x andy,  say 

(*i>.yi),  (^2.^2).—.  (.Xn.yn) 


On  the  basis  of  theoretical  considerations  or  simply  by  observing  the  pattern  of  the  points,  the  experimenter 
decides  on  the  general  form  of  the  curve  y = f ( x ) 1°  be  fitted.  Some  possibilities  are  (Figure  6.5.1) 

(a)  A straight  line:  y = a + bx 

'y 

A quadratic  polynomial:  y = a + bx  + cx 

9 9 

(c)  A cubic  polynomial:  y = a + bx  + cx  + dx 


Because  the  points  are  obtained  experimentally,  there  is  often  some  measurement  “error”  in  the  data,  making 
it  impossible  to  find  a curve  of  the  desired  form  that  passes  through  all  the  points.  Thus,  the  idea  is  to  choose 
the  curve  (by  determining  its  coefficients)  that  “best”  fits  the  data.  We  begin  with  the  simplest  and  most 
common  case:  fitting  a straight  line  to  data  points. 


A> 


x 


(a)  y = a + bx 


x 

-► 


( b ) y = a + bx  4-  cjt 


Figure  6.5.1 


(c)  y = a + bx  + cxr  -I-  dx* 


Least  Squares  Fit  of  a Straight  Line 

Suppose  we  want  to  fit  a straight  line  y = a I bx  to  the  experimentally  determined  points 

(*l>.yi)>  (*2.72),— > (Xn,yn) 

If  the  data  points  were  collinear,  the  line  would  pass  through  all  n points,  and  the  unknown  coefficients  a and 
b would  satisfy  the  equations 


y 1 

= a 

+ 

bx  1 

72 

= a 

+ 

bx  2 

y» 

= a 

+ 

bxn 

We  can  write  this  system  in  matrix  form  as 


1 x\ 

>r 

1 x2 

'a 

72 

: : 

b 

: 

1 xn 

7m 

or  more  compactly  as 

Mv  = y 


where 


>r 

"l 

^1 

72 

, M — 

1 

x2 

, V = 

7m 

1 

a) 


(2) 


If  the  data  points  are  not  collinear,  then  it  is  impossible  to  find  coefficients  a and  b that  satisfy  system  1 
exactly;  that  is,  the  system  is  inconsistent.  In  this  case  we  will  look  for  a least  squares  solution 


♦ * 

We  call  a line  y = a + b x whose  coefficients  come  from  a least  squares  solution  a regression  line  or  a 


least  squares  straight  line  fit  to  the  data.  To  explain  this  terminology,  recall  that  a least  squares  solution  of  1 
minimizes 


l|y-Mv||  (3) 

If  we  express  the  square  of  3 in  terms  of  components,  we  obtain 

||y  — Mv||2  = Oi  -a-bx\)2  + O2 -a -^*2)2 + ••-  + On-^-^w)2  (4) 

If  we  now  let 

d\  = \y\-a-bx\\,  ^2  = \yi  ~a  ~b*2\ dn  = ^n-a  — bxn  \ 

then  4 can  be  written  as 

||y-Mv||2  = ^ + J22  + ...  + ^ (5) 

As  illustrated  in  Figure  6.5.2,  the  number  di  can  be  interpreted  as  the  vertical  distance  between  the  line 
y = a I bx  and  the  data  point  y, ) • This  distance  is  a measure  of  the  “error”  at  the  point  (xlr  y}) 


resulting  from  the  inexact  fit  of  y = a \ bx  to  the  data  points,  the  assumption  being  that  the  Xj  are  known 
exactly  and  that  all  the  error  is  in  the  measurement  of  the  y2- . Since  3 and  5 are  minimized  by  the  same  vector 
v * , the  least  squares  straight  line  fit  minimizes  the  sum  of  the  squares  of  the  estimated  errors  d^  hence  the 
name  least  squares  straight  line  fit. 


dj  measures  the  vertical  error  in  the  least  squares  straight  line. 


Normal  Equations 

Recall  from  Theorem  6.4.2  that  the  least  squares  solutions  of  1 can  be  obtained  by  solving  the  associated 
normal  system 

MTMv=MT y 

the  equations  of  which  are  called  the  normal  equations. 

In  the  exercises  it  will  be  shown  that  the  column  vectors  of  Mare  linearly  independent  if  and  only  if  the  n data 
points  do  not  lie  on  a vertical  line  in  the  xy-plane.  In  this  case  it  follows  from  Theorem  6.4.4  that  the  least 
squares  solution  is  unique  and  is  given  by 

v = M^y 

In  summary,  we  have  the  following  theorem. 


Uniqueness  of  the  Least  Squares  Solution 


Let  (*i,  yi),  (*2,  T2)>  (*h>  yn)  be  a set  of  two  or  more  data  points,  not  all  lying  on  a vertical 

line,  and  let 


'l 

*1 

>r 

M = 

1 

*2 

and  y = 

y 2 

1 

xn 

yn 

Then  there  is  a unique  least  squares  straight  line  fit 

* 

y = a 


, * 

+ 6 x 


to  the  data  points.  Moreover, 


is  given  by  the  formula 


* 

v 


* 

a 


b 


* 


v*=(mtm)  lMTy 

which  expresses  the  fact  that  v = v*  is  the  unique  solution  of  the  normal  equations 

M TMv  = MTy 


(6) 


(7) 


EXAMPLE  1 Least  Squares  Straight  Line  Fit 


Find  the  least  squares  straight  line  fit  to  the  four  points  (0,  1 ) , ( 1 , 3) , (2, 4) , and  (3,4).  (See 
Figure  6.5.3.) 


X 


Figure  6.5.3 


olution  We  have 


1 0 
1 1 

, MrM  = 

'4  6 ' 

, and  (MTM)  1 = 

7 -3" 

1 2 

_6  14 

-3  2 _ 

1 3 

v*  = [MTM\  Vry  = -V 

1 

7 -3' 

'1111' 

3 



'1.5' 

\ / 10 

-3  2 _ 

.0  1 2 3_ 

4 

4 

_ 1 _ 

so  the  desired  line  is  y = 1 . 5 + x- 


EXAMPLE  2 Spring  Constant 


Hooke's  law  in  physics  states  that  the  length  x of  a uniform  spring  is  a linear  function  of  the 
force  y applied  to  it.  If  we  express  this  relationship  as  y = a | bx  ■>  then  the  coefficient  b is 
called  the  spring  constant.  Suppose  a particular  unstretched  spring  has  a measured  length  of  6.1 
inches  (i.e.,  x = 6.1  when  y = 0).  Forces  of  2 pounds,  4 pounds,  and  6 pounds  are  then  applied 
to  the  spring,  and  the  corresponding  lengths  are  found  to  be  7.6  inches,  8.7  inches,  and  10.4 
inches  (see  Figure  6.5.4).  Find  the  spring  constant. 


fb/vcy 


xi 

6.1 

0 

7.6 

2 

8.7 

4 

10.4 

6 

Figure  6.5.4 


We  have 


and 


"l 

6.1" 

"o" 

i 

7.6 

2 

i 

8.7 

. y= 

4 

1 10.4 

6 

where  the  numerical  values  have  been  rounded  to  one  decimal  place.  Thus,  the  estimated  value 
of  the  spring  constant  is  b sz  1.4  pounds/inch. 
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Source:  NASA 

On  October  5,  1991  the  Magellan  spacecraft  entered  the  atmosphere  of  Venus  and 
transmitted  thetemperature  T in  kelvins  (K)  versus  the  altitude  h in  kilometers  (km)  until  its  signal 
was  lost  at  an  altitude  of  about  34  km.  Discounting  theinitial  erratic  signal,  the  data  strongly 
suggested  a linear  relationship,  so  a least  squares  straight  line  fit  was  used  on  the  linear  part  of  the 
data  to  obtain  the  equation 

T = 737.5  — 8.125/2 

By  setting  fo  = Q in  this  equation,  the  surface  temperature  of  Venus  was  estimated  at  737. 5K. 


! Temperature  of  Venusian 
: \ru  .sphere 

Magellan  orbit  3213 
: Dale:  5 October  1991 

Latitude:  67  N 
ITST:  22:05 
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Least  Squares  Fit  of  a Polynomial 

The  technique  described  for  fitting  a straight  line  to  data  points  can  be  generalized  to  fitting  a polynomial  of 
specified  degree  to  data  points.  Let  us  attempt  to  fit  a polynomial  of  fixed  degree  m 

y =ao  + <*1*  +...  + amxm  (8) 


to  n points 


(*l>.yi)>  (*2,72) (xn,yn) 

Substituting  these  n values  of  x and  y into  8 yields  the  n equations 


71 

y 2 

yn 

or,  in  matrix  form, 


+ 

<*1*1 

+ 

amx  l 

ao 

+ 

a 1*2 

„ m 

amx2 

+ 

alxn 

„ w 

amxn 

y = Mv 


(9) 


where 


(10) 


>f 

1 

* 

>— *■ 

^ to 

xm 

*1 

~ a o ‘ 

y 2 

, M = 

^ - 
ff  - 

rm 

*2 

, v = 

a 1 

y» 

1 Xn  Xyi  ... 

* 

i 

am 

As  before,  the  solutions  of  the  normal  equations 

MTMv  = MTy 

determine  the  coefficients  of  the  polynomial,  and  the  vector  v minimizes 

||y  — Mv|| 

Conditions  that  guarantee  the  invertibility  of  are  discussed  in  the  exercises  (Exercise  7).  If  is 

invertible,  then  the  normal  equations  have  a unique  solution  v = v \ which  is  given  by 


EXAMPLE  3 Fitting  a Quadratic  Curve  to  Data 

According  to  Newton's  second  law  of  motion,  a body  near  the  Earth's  surface  falls  vertically 
downward  according  to  the  equation 

s = so  + vo*  + ^2  (12) 


where 

s = vertical  displacement  downward  relative  to  some  fixed  point 
= initial  displacement  at  time  t = 0 
v 0 = initial  velocity  at  time  t = 0 
g = acceleration  of  gravity  at  the  Earth's  surface 
from  Equation  12  by  releasing  a weight  with  unknown  initial  displacement  and  velocity  and 
measuring  the  distance  it  has  fallen  at  certain  times  relative  to  a fixed  reference  point.  Suppose 
that  a laboratory  experiment  is  performed  to  evaluate  g.  Suppose  it  is  found  that  at  times 
t = .1,  .2,  .3,  .4,  and  .5  seconds  the  weight  has  fallen  s = — 0. 18,  0.31,  1.03,  2.48,  and  3.73 
feet,  respectively,  from  the  reference  point.  Find  an  approximate  value  of  g using  these  data. 

The  mathematical  problem  is  to  fit  a quadratic  curve 

s = aQ  + a\t  + ct2t2  (13) 


to  the  five  data  points: 

(.1,-0  18),  (.2,0.31),  (.3,1.03),  (.4,2.48),  (.5,3.73) 

With  the  appropriate  adjustments  in  notation,  the  matrices  M and  y in  10  are 


M = 


1 1 1 tj 
1 t2  t\ 

1 4 

1 ^4 

i t5  4 


"l 

.1 

.of 

’sf 

'-0.18" 

1 

.2 

.04 

S2 

0.31 

1 

.3 

.09 

> y = 

S3 

= 

1.03 

1 

.4 

.16 

s4 

2.48 

1 

.5 

.25 

S5 

3.73 

Thus,  from  1 1 , 


* 

V = 


aQ 

* 

a 1 
* 

a2 


= 


-1 


Mry  = 


-0.40 

0.35 

16.1 


1 


From  12  and  13,  we  have  a2  = f g,  so  the  estimated  value  of  g is 

£ 

g = 2^  = 2(16.1)  = 32.2  feet  / second2 

If  desired,  we  can  also  estimate  the  initial  displacement  and  initial  velocity  of  the  weight: 


SO 

vo 


= <Xq  = — 0.40  feet 

* 

= &\  = 0.35  feet/second 


In  Figure  6.5.5  we  have  plotted  the  five  data  points  and  the  approximating  polynomial. 


Figure  6.5.5 


Concept  Review 

Least  squares  straight  line  fit 

Regression  line 

Least  squares  polynomial  fit 


Skills 


Find  the  least  squares  straight  line  fit  to  a set  of  data  points. 
Find  the  least  squares  polynomial  fit  to  a set  of  data  points. 
Use  the  techniques  of  this  section  to  solve  applied  problems. 


Exercise  Set  6.5 

1.  Find  the  least  squares  straight  line  fit  to  the  three  points  (0,  0),  (1,  2),  and  (2,  7). 

Answer: 

y=~2  + 2X 

2.  Find  the  least  squares  straight  line  fit  to  the  four  points  (0,  1 ) , (2,  0) , (3,  1 ) , and  (3,2). 

3.  Find  the  quadratic  polynomial  that  best  fits  the  four  points  (2,  0),  (3,  — 10),  (5,  — 48),and(6,  —76). 

Answer: 

y = 2 + 5x  - 3x2 

4.  Find  the  cubic  polynomial  that  best  fits  the  five  points  ( — 1,  — 14),  (0,  — 5),  (1,  — 4),  (2,  1),  and 
(3,  22). 

5.  Show  that  the  matrix  M in  Equation  2 has  linearly  independent  columns  if  and  only  if  at  least  two  of  the 

numbers  x\,  X2, , are  distinct. 

6.  Show  that  the  columns  of  the  n x {m  + 1)  matrix  Min  Equation  10  are  linearly  independent  if  n > m and 

at  least  m | 1 of  the  numbers  x\,  --->  xn  are  distinct.  [Hint:  A nonzero  polynomial  of  degreem  has  at 

most  m distinct  roots.] 

7.  Let  M be  the  matrix  in  Equation  10.  Using  Exercise  6,  show  that  a sufficient  condition  for  the  matrix 

M^M  to  be  invertible  is  that  n>m  and  that  at  least  m + 1 of  the  numbers  x\,  •-->  xn  are  distinct. 

8.  The  owner  of  a rapidly  expanding  business  finds  that  for  the  first  five  months  of  the  year  the  sales  (in 
thousands)  are  $4.0,  $4.4,  $5.2,  $6.4,  and  $8.0.  The  owner  plots  these  figures  on  a graph  and  conjectures 
that  for  the  rest  of  the  year,  the  sales  curve  can  be  approximated  by  a quadratic  polynomial.  Find  the  least 
squares  quadratic  polynomial  fit  to  the  sales  curve,  and  use  it  to  project  the  sales  for  the  twelfth  month  of 
the  year. 

9.  A corporation  obtains  the  following  data  relating  the  number  of  sales  representatives  on  its  staff  to  annual 
sales: 


Number  of 

Sales  Representatives 

5 

10 

15 

20 

25 

30 

Annual  Sales  (millions) 

3.4 

4.3 

5.2 

6.1 

7.2 

8.3 

Explain  how  you  might  use  least  squares  methods  to  estimate  the  annual  sales  with  45  representatives,  and 
discuss  the  assumptions  that  you  are  making.  (You  need  not  perform  the  actual  computations.) 


10.  Pathfinder  is  an  experimental,  lightweight, remotely  piloted, solar-powered  aircraft  that  was  used  in  aseries 
of  experiments  by  NASA  to  determine  the  feasibilityof  applyingsolar  power  for  long-duration,high- 
altitude  flight.  In  August  1997  Pathfinder  recordedthe  data  in  the  accompanying  table  relating  altitude  PI 
and  temperature  T.  Show  that  a linear  model  is  reasonable  by  plotting  the  data,  and  then  find  theleast 
squares  line  H = Hq  + k 7 of  best  fit. 

Table  Ex-10 


Altitude  H 
(thousands  of  feet) 

15 

20 

25 

30 

35 

40 

45 

Temperature  T 
(°C) 

4.5 

-5.9 

-16.1 

-27.6 

-39.8 

-50.2 

-62.9 

11.  Find  a curve  of  the  form  y = a I (b  f x)  that  best  fits  the  data  points  (1,7),  (3,  3) , (6,  1 ) by  making  the 
substitution  X = 1 / x-  Draw  the  curve  and  plot  the  data  points  in  the  same  coordinate  system. 


Answer: 


I 10 

True-False  Exercises 

In  parts  (a)-(d)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  Every  set  of  data  points  has  a unique  least  squares  straight  line  fit. 

Answer: 

False 

(b)  If  the  data  points  (jc  l , ^ l ) , (*2>  y 2)>  /«)  are  not  collinear,  then  1 is  an  inconsistent  system. 

Answer: 

True 

(c)  If  y = a + bx  is  the  least  squares  line  fit  to  the  data  points  (x  1 , y \ ) , (*2>  J2).  --->  yn ) > then 
dj  = [y,  — (a  + bx ,)  | is  minimal  for  every  1 < i < n- 


Answer: 


False 


(d)  If  y = a I bx  is  the  least  squares  line  fit  to  the  data  points  (*i,  y i),  (7:2,  y2),  •••»  yM) , then 

n 2 

ly>  — (a  -H  bxj)  r is  minimal. 
i=lr  1 

Answer: 

True 
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6.6  Function  Approximation;  Fourier  Series 

In  this  section  we  will  show  orthogonal  projections  can  be  used  to  approximate  certain  types  of  functions  by 
simpler  functions  that  are  easier  to  work  with.  The  ideas  explained  here  have  important  applications  in 
engineering  and  science.  Calculus  is  required. 


Best  Approximations 

All  of  the  problems  that  we  will  study  in  this  section  will  be  special  cases  of  the  following  general  problem. 


APPROXIMATION  PROBLEM 

Given  a function / that  is  continuous  on  an  interval  [a,  b ] , find  the  “best  possible  approximation”  to / 
using  only  functions  from  a specified  subspace  W of  C[a,  b] . 


Here  are  some  examples  of  such  problems: 

Find  the  best  possible  approximation  to  ex  over  [0,  1 ] by  a polynomial  of  the  form  | a^x  | 

Find  the  best  possible  approximation  to  sin~x  over  [ — 1,  1 ] by  a function  of  the  form 
«0  4-  <*\ex  4-  tt2<P‘x  4-  ays^x- 

Find  the  best  possible  approximation  to  x over  [ 0,  2tt]  by  a function  of  the  form 
«o  4-  tat i sin  x -F  <Z2sin  2x  4-  &icos  x + &2C0S  2x. 

In  the  first  example  W is  the  subspace  of  C[0,  1]  spanned  by  1,  x,  and  x^;  in  the  second  example  W is  the 
subspace  of  C[  — 1,  1 ] spanned  by  1,  ex,  and  e^x;  and  in  the  third  example  W is  the  subspace  of 
C[0,  2x]  spanned  by  1,  sin  x,  Sm  2x,  cos  x,  and  Cos  2x- 


Measurements  of  Error 

To  solve  approximation  problems  of  the  preceding  types,  we  first  need  to  make  the  phrase  “best 
approximation  over  [a,  b]  ” mathematically  precise.  To  do  this  we  will  need  some  way  of  quantifying  the 
error  that  results  when  one  continuous  function  is  approximated  by  another  over  an  interval  [a,  b] . If  we 
were  to  approximate  / (x)  by  g(x),  and  if  we  were  concerned  only  with  the  error  in  that  approximation  at  a 
single  point  xq,  then  it  would  be  natural  to  define  the  error  to  be 

error  = |/(x0)  -g(*o)| 

sometimes  called  the  deviation  between / and  g at  *0  (Figure  6.6.1).  However,  we  are  not  concerned  simply 
with  measuring  the  error  at  a single  point  but  rather  with  measuring  it  over  the  entire  interval  [a,  b ] . The 
problem  is  that  an  approximation  may  have  small  deviations  in  one  part  of  the  interval  and  large  deviations  in 
another.  One  possible  way  of  accounting  for  this  is  to  integrate  the  deviation  |/  (x)  — g(x)  | over  the  interval 
[a,  b ] and  define  the  error  over  the  interval  to  be 


error  = 


jf>  (*)-«« 


dx 


(1) 


Geometrically,  1 is  the  area  between  the  graphs  of  / (x)  and  g(x)  over  the  interval  [a,  b ] (Figure  6.6.2);  the 
greater  the  area,  the  greater  the  overall  error. 


The  deviation  between / and  g xq 


R 


f 


I 


+ 

b 


The  area  between  the  graphs  of  f and  g over  [a,  b ] measures  the  error  in  approximating/ 
by  g over  [a,  b] 


Although  1 is  natural  and  appealing  geometrically,  most  mathematicians  and  scientists  generally  favor  the 
following  alternative  measure  of  error,  called  the  mean  square  error. 


mean  square  error  = 


[/«-*(*)] 2 


dx 


Mean  square  error  emphasizes  the  effect  of  larger  errors  because  of  the  squaring  and  has  the  added  advantage 
that  it  allows  us  to  bring  to  bear  the  theory  of  inner  product  spaces.  To  see  how,  suppose  that  f is  a continuous 
function  on  [a,  b ] that  we  want  to  approximate  by  a function  g from  a subspace  W of  C[a,  b] , and  suppose 
that  C[a,  b]  is  given  the  inner  product 


f(x)g(x)  dx 


It  follows  that 


^ — ,r  f — g'| , = / [/ (x)  — g(x)] 2 dx  = mean  square  error 


iif-gir =(f-g 


J a 


so  minimizing  the  mean  square  error  is  the  same  as  minimizing  ||f  — g||  . Thus  the  approximation  problem 


posed  informally  at  the  beginning  of  this  section  can  be  restated  more  precisely  as  follows. 


Least  Squares  Approximation 


1 


LEAST  SQUARES  APPROXIMATION  PROBLEM 


Let  f be  a function  that  is  continuous  on  an  interval  [a,  b] , let  C[a,  b]  have  the  inner  product 


/(x)g(x)  dx 


and  let  IFbe  a finite-dimensional  subspace  of  C[a,  b] . Find  a function  g in  W that  minimizes 


iif-gn2 


[/«-««] 2 


dx 


L J 

Since  iif-gir  and  Ilf  - all  are  minimized  by  the  same  function  g,  this  problem  is  equivalent  to  looking  for  a 

function  g in  W that  is  closest  to  f.  But  we  know  from  Theorem  6.4.1  that  g = projw  f is  such  a function 
(Figure  6.6.3). 

T = function  in  C[a>  b] 
to  be  approximated 


r 

g = proj  w,f  = least  squares 

^ approximation 

subspace  of  to  f from  W 

approximating 
functions 

Figure  6.6.3 

Thus,  we  have  the  following  result. 


THEOREM  6.6.1 

If  f is  a continuous  function  on  [a,  b] , and  IF  is  a finite-dimensional  subspace  of  C[a,  b] , then  the 
function  g in  W that  minimizes  the  mean  square  error 


f 


[/(x)-g(x)  Ydx 


is  g = proj^f,  where  the  orthogonal  projection  is  relative  to  the  inner  product 

(f,  f(x)g(x)  dx 

The  function  g = proj^  f is  called  the  last  squares  approximation  to  f from  W. 


Fourier  Series 


A function  of  the  form 

T(x)  =cq  +cicos  x + c2cos  2x  4-  • • • + cMcos  nx  + dfisinx  4-  (i2sin  2x  + • • • +i2?Msin«x  (2) 

is  called  a trigonometric  polynomial,  if  cn  and  dn  are  not  both  zero,  then  T(x)  is  said  to  have  order  n.  For 
example, 

T(x)  = 2 + cos  x — 3 cos  2x  + 7 sin  Ax 
is  a trigonometric  polynomial  of  order  4 with 

co  = 2,C\  = 1,  C2  = - 3,  C3  = 0,  c4  = 0,  = 0,  ^2  = 0-  ^3  = 0,  ^4  = 7 

It  is  evident  from  2 that  the  trigonometric  polynomials  of  order  n or  less  are  the  various  possible  linear 
combinations  of 


1,  cost:,  cos2x,...,  cos  nx,  sinx,  sin  2x, sin  nx  (3) 

It  can  be  shown  that  these  2 n | 1 functions  are  linearly  independent  and  thus  form  a basis  for  a (2n  + 1 ) 
-dimensional  subspace  of  C[a,  b] . 

Let  us  now  consider  the  problem  of  finding  the  least  squares  approximation  of  a continuous  function  / (x) 
over  the  interval  [0,  2tt]  by  a trigonometric  polynomial  of  order  n or  less.  As  noted  above,  the  least  squares 
approximation  to  f from  W is  the  orthogonal  projection  of  f on  W.  To  find  this  orthogonal  projection,  we  must 
find  an  orthonormal  basis  gg,  g\, g2n  f°r  ^ after  which  we  can  compute  the  orthogonal  projection  on  W 
from  the  formula 


proj^f  = (f,  g0}go  + (f,  gl}gl  + • * • +{f, 


(4) 


(see  Theorem  6.3.46).  An  orthonormal  basis  for  W can  be  obtained  by  applying  the  Gram-Schmidt  process  to 
the  basis  vectors  in  3 using  the  inner  product 

-2?r 

(f,g}=/  / (x)g(x)dx 

J 0 

This  yields  the  orthonormal  basis 


go  = ~F=,  gl  = —7=  COS  X g«  = -7=  COS  nx, 

\2x  y 5T  y x 

g„+l  = -j=  sinx g2«  = “7=  sin  nx 

yr  y 7T 


(5) 


(see  Exercise  6).  If  we  introduce  the  notation 


(6) 


^0  f — jf?  SO  L ^1  j — jf > 61  L — > &yi  i — jf  > 8 n ] 

\ 2k  \ | y k | | y k | 

^1  = j — » Sm+1  1*  •••>  t — jf  * 82 n 

y 7T  | | y 7T  | 

then  on  substituting  5 in  4,  we  obtain 

proj^f  = + [ajcos  x + • • • + aMcos  «*]  + [£isin  * + • • • +&„sin«x] 


where 


<»0  = -7^=  f . go  = -7 =[  f (x)~F=  <*X  = 1 f f(x)dx 

{2k\  I {2k  Jo  {2k  Jo 
1 I 1 1 fi*  1 1 

a\  = -^(f,  Si  I = —j=  I /(*)—;=  cos  x dx  = - I /(x)cosxdx 

j/;rl  I \jxJo  y k “Jo 


i 1 i f"  i i 

n = —j=  f,  g„  = —j=  I f (x)—j=  cos  nx  dx  = — / / (x)  cos  nx  dx 

Y?r|  | \jxJo  y k “Jo 

1 1 1 1 f** 

g«+l  =-/=/  f(x)-i=smxdx  = - f(x)smxdx 

| ifirio  yfir  -/0 

1 /^~  1 1 

= —=l  fix)— sin  nx  dx  = — / / (x)  sin 

fiJo  JK  ’ Vo 


t>\-  r 

\K 


— / — [ f * 62m 

y at 


In  short. 


a* 


«2?r  «2?r 

= — / /(x)cos  /for  rix,  bfr  = — j /(x)srn  kx  dx 

*J  o 57  7o 


The  numbers  3q,  dJj,  an,  b\,  ...,bn  are  called  the  Fourier  coefficients  of  f. 

EXAMPLE  1 Least  Squares  Approximations 

Find  the  least  squares  approximation  of  / (x)  = x on  [0,  2tt]  by 
a trigonometric  polynomial  of  order  2 or  less; 
a trigonometric  polynomial  of  order  n or  less. 


(7) 


(8) 


Solution 

(a) 


a o 


= i/2’/w*=1A 

Vo  Vo 


x dx  = 2k 


(9a) 


For  k = 1,  2, integration  by  parts  yields  (verify) 


a* 


1 f 2?r  1 f2* 

— — f (x)  cos  kx  dx  = — / x cos  £x  dx  = 0 

*h  *Jo 


(9b) 


bk 


if2’ 

*Jo 


1 f2n  2 

/ (x)sin  kx  dx  — — I x sin  kx  dx  = — — 

57  7o  * 


(9c) 


Thus,  the  least  squares  approximation  to  x on  [0,  2tt]  by  a trigonometric  polynomial  of 
order  2 or  less  is 

x ss  4-tficos  x + A2COS  2x  q-6isinx  4-&2  sin  2x 


or,  from  (9a),  (9b),  and  (9c), 


x « 7T  — 2 sin  x — sm  2x 


The  least  squares  approximation  to  x on  [0,  2tt]  by  a trigonometric  polynomial  of  order  n 
or  less  is 

x ^p-  4-  [«icosx+  • • • +i2mcos«x]  + [&isinx+  • • • +i>„sin«x] 

or,  from  (9a),  (9b),  and  (9c), 

->  , sin  2x  , sin  3x  , , sin  nx 

xpsx  — 2 1 sin  x + — — — +=  — - — + • • • + 


2 3’  n 

The  graphs  of  y = x and  some  of  these  approximations  are  shown  in  Figure  6.6.4. 


Figure  6.6.4 


It  is  natural  to  expect  that  the  mean  square  error  will  diminish  as  the  number  of  terms  in  the 
least  squares  approximation 

an  n 

/ (x)  se  + 5Z  (tffccos  kx  4-  i^sin  kx) 
z k= 1 


increases.  It  can  be  proved  that  for  functions /in  C[0,  2x] , the  mean  square  error 
approaches  zero  as  « _►  q-  ooi  this  is  denoted  by  writing 

□Q 

/ (x)  = + 5Z  (afccos  kx  4-  ifcsin  kx) 

1 k= 1 


The  right  side  of  this  equation  is  called  the  Fourier  series  for / over  the  interval  [0,  2tt]  . 
Such  series  are  of  major  importance  in  engineering,  science,  and  mathematics. 


Jean  Baptiste  Fourier  (1768-1830) 

Fourier  was  a French  mathematician  and  physicist  who  discovered 
the  Fourier  series  and  related  ideas  while  working  on  problems  of  heat  diffusion.  This 
discovery  was  one  of  the  most  influential  in  the  history  of  mathematics;  it  is  the 
cornerstone  of  many  fields  of  mathematical  research  and  a basic  tool  in  many  branches 
of  engineering.  Fourier,  a political  activist  during  the  French  revolution,  spent  time  in 
jail  for  his  defense  of  many  victims  during  the  Terror.  He  later  became  a favorite  of 
Napoleon  and  was  named  a baron. 

[Image:  The  Granger  Collection,  New  York ] 


Concept  Review 

Approximation  of  functions 
Mean  square  error 
Least  squares  approximation 
Trigonometric  polynomial 
Fourier  coefficients 
Fourier  series 

Skills 

Find  the  least  squares  approximation  of  a function. 

Find  the  mean  square  error  of  the  least  squares  approximation  of  a function. 
Compute  the  Fourier  series  of  a function. 


Exercise  Set  6.6 


1.  Find  the  least  squares  approximation  of  / (x)  = 1 4-  x over  the  interval  [0,  2tt]  by 

(a)  a trigonometric  polynomial  of  order  2 or  less. 

(b)  a trigonometric  polynomial  of  order  n or  less. 

Answer: 

(a)  ( 1 + ff)  — 2 sin  x — sin  2x 

(b)  (l+.)_2rsinl4 

2 D ^ 

2.  Find  the  least  squares  approximation  of  / (x)  = x over  the  interval  [0,  2x]  by 

(a)  a trigonometric  polynomial  of  order  3 or  less. 

(b)  a trigonometric  polynomial  of  order  n or  less. 

3*  (a)  Find  the  least  squares  approximation  of  x over  the  interval  [0,  1 ] by  a function  of  the  form  a + be* . 
(b)  Find  the  mean  square  error  of  the  approximation. 


Answer: 


(a) 

(b) 


13  ■ l + « 

12  2(1  -e) 


(a)  Find  the  least  squares  approximation  of  e*  over  the  interval  [0,  1 ] by  a polynomial  of  the  form 

aQ+a\x. 

(b)  Find  the  mean  square  error  of  the  approximation. 

(a)  Find  the  least  squares  approximation  of  sin  xx  over  the  interval  [-1,  1]  by  a polynomial  of  the  form 
aQ+a\x  +a2*2- 

(b)  Find  the  mean  square  error  of  the  approximation. 


Answer: 

(a)  2-x 

7T 

6.  Use  the  Gram-Schmidt  process  to  obtain  the  orthonormal  basis  5 from  the  basis  3. 

7.  Carry  out  the  integrations  indicated  in  Formulas  9a?  9b,  and  9c. 

8.  Find  the  Fourier  series  of  / ( x ) = tt  — x over  the  interval  [0,  2tt]  . 


9.  Find  the  Fourier  series  of  / (x)  = 1,  0 < x < tr  and  / (x)  = 0,  jt  < x < over  the  interval  [0,  2ir] . 


Answer: 


10.  What  is  the  Fourier  series  of  sin(3x)? 

True-False  Exercises 

In  parts  (a)-(e)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  If  a function  f in  C[a,  b]  is  approximated  by  the  function  g,  then  the  mean  square  error  is  the  same  as  the 
area  between  the  graphs  of  / (x)  and  g(x)  over  the  interval  [a,b]. 

Answer: 

False 

(b)  Given  a finite-dimensional  subspace  WofC[a,b],  the  function  g = proj n f minimizes  the  mean  square 
error. 

Answer: 

True 

(c)  { 1,  cos*,  sinx,  cos2x,  sin2x)  is  an  orthogonal  subset  of  the  vector  space  C[0,  2tt]  with  respect  to  the 


(d)  {1,  cosx,  siiu,  cos2x,  sin2x } is  an  orthonormal  subset  of  the  vector  space  C[0,  2jt]  with  respect  to  the 


inner 


Answer: 


True 


inner 


Answer: 


False 

(e)  { 1,  cos*,  sinx,  cos2x,  sin2x)  is  a linearly  independent  subset  of  C[0,  2x] . 


Answer: 


True 
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Supplementary  Exercises 


1.  Let  R4  have  the  Euclidean  inner  product. 

(a)  Find  a vector  in  R4  that  is  orthogonal  to  ui  = (1,  0,  0,  0)  and  U4  = (0,  0,  0,  1)  and  makes  equal 
angles  with  U2  = (0,  1,  0,  0)  and  113  = (0,  0,  1,0). 

(b)  Find  a vector  x = (xq,  xj,  *3,  *4)  of  length  1 that  is  orthogonal  to  and  U4  above  and  such  that  the 
cosine  of  the  angle  between  x and  U2  is  twice  the  cosine  of  the  angle  between  x and  U3. 

Answer: 


(a)  (0,  a,  a,  0)  with  a * 0 

<b>  * (°-  fi  7?'  °) 

2.  Prove:  If  (u,  v\  is  the  Euclidean  inner  product  on  Rn,  and  if  A is  an  ^ x n matrix,  then 

|u,  ^4vj  = I^ru,  vj 

[Hint:  Use  the  fact  that  ju,  v)  = u - v = vTu.] 

Fet  M 22  have  the  inner  product  j V ^ J = tr{u^V J = tr{V‘  U \ that  was  defined  in  Example  6 of 

Section  6. 1 . Describe  the  orthogonal  complement  of 

(a)  the  subspace  of  all  diagonal  matrices. 

(b)  the  subspace  of  symmetric  matrices. 


Answer: 


(a)  The  subspace  of  all  matrices  in  M22  with  only  zeros  on  the  diagonal. 

(b)  The  subspace  of  all  skew-symmetric  matrices  in  il^22- 

4.  Let  = 0 be  a system  of  m equations  in  n unknowns.  Show  that 

■*r 

*2 

x=  . 

xn 

is  a solution  of  this  system  if  and  only  if  the  vector  x = (xj,  xj, ...,  xn)  is  orthogonal  to  every  row  vector 
of  A with  respect  to  the  Euclidean  inner  product  on  Rn. 

5.  Use  the  Cauchy-Schwarz  inequality  to  show  that  if  a\,  <22,  ...,an  are  positive  real  numbers,  then 

(tfl+fl2+  • • • +^)(^f  + ^+  ' ’ ' + ^)-"2 

6.  Show  that  if  x and  y are  vectors  in  an  inner  product  space  and  c is  any  scalar,  then 


Ilex 4- yll2  = ,2||x||2  + 2c | x,  yj  + ||y||2 

7.  Let  R-'  have  the  Euclidean  inner  product.  Find  two  vectors  of  length  1 that  are  orthogonal  to  all  three  of 
the  vectors  ui  = (1,  1,  — 1),  U2  = ( — 2,  — 1,  2),  and  113  =(—1,0,  1). 

Answer: 


8.  Find  a weighted  Euclidean  inner  product  on  Rn  such  that  the  vectors 

vi  = (1,  0,  0, ....  0) 

v2  = (0,^0,...,0) 

v3  = (O,  0,  {2,...,  O) 

v„  = (0,  0,  0 {n  } 


form  an  orthonormal  set. 

9.  Is  there  a weighted  Euclidean  inner  product  on  R 2 for  which  the  vectors  (1,2)  and  (3,  — 1)  form  an 
orthonormal  set?  Justify  your  answer. 


Answer: 


No 

10.  If  u and  v are  vectors  in  an  inner  product  space  Y - then  u,  v,  and  u — v can  he  regarded  as  sides  of  a 
“triangle”  in  V (see  the  accompanying  figure).  Prove  that  the  law  of  cosines  holds  for  any  such  triangle; 
that  is, 

ll„  — vl|2  = l|u||2+  ||v|| 2 — 2||u||||v||cos  6 
where  0 is  the  angle  between  u and  v. 


V 

u 

Figure  Ex-10 

(a)  As  shown  in  Figure  3.2.6,  the  vectors  {k,  0,  0),  (0,  k,  0),  and  (0,  0,  k)  form  the  edges  of  a cube  in 
with  diagonal  ( k , k,  k) . Similarly,  the  vectors 

(*,  0,0,...,0),  (0, k,  0 0),...,  (0,0,0,..,*) 

can  be  regarded  as  edges  of  a “cube”  in  Rn  with  diagonal  (k,  k,  k, k).  Show  that  each  of  the  above 
edges  makes  an  angle  of  0 with  the  diagonal,  where  cos  9 = 1 / \fn- 

(b)  Calculus  required  What  happens  to  the  angle  0 inpart  (a)  as  the  dimension  of  Rn  approaches  -|-oo? 


Answer: 


(b)  0 approaches 

12.  Let  u and  v be  vectors  in  an  inner  product  space. 

(a)  Prove  that  ||u||  = ||v||  if  and  only  if  u | v and  u — v are  orthogonal. 

(b)  Give  a geometric  interpretation  of  this  result  in  R2  with  the  Euclidean  inner  product. 

13.  Let  u be  a vector  in  an  inner  product  space  V,  and  let  (vi,  V2, ...,  v„}  be  an  orthonormal  basis  for  V. 
Show  that  if  a,-  is  the  angle  between  u and  v2,  then 

cos  oq  + cos  £*2  + ’ ' ' + cos  an  = 1 

14.  Prove:  If  (u,  and  (u,  vV-,  are  two  inner  products  on  a vector  space  V ] then  the  quantity 

(u,v}  = (u,v}1+(u,v}2  is  also  an  inner  product. 

15.  Prove  Theorem  6.2.5. 

16.  Prove:  If  A has  linearly  independent  column  vectors,  and  if  b is  orthogonal  to  the  column  space  of  A, then 
the  least  squares  solution  of  j^x.  = b is  x = 0- 

17.  Is  there  any  value  of  s for  which  = 1 and  X2  = 2 is  the  leastsquares  solution  of  the  following  linear 
system? 

*1  “ x2  = 1 

2x\  + 3*2  = 1 

4x\  + 5x2  = s 

Explain  your  reasoning. 

Answer: 

No 

18.  Show  that  if p and  q are  distinct  positive  integers,  then  the  functions  / (x)  = sin  px  and  g(x)  = sin  qx  are 
orthogonal  with  respect  to  the  inner  product 

f 2* 

<f,g}=yo  /(x)g(x)dx 

19.  Show  that  if p and  q are  positive  integers,  then  the  functions  / (x)  = cos  px  and  g(x)  = sin  qx  are 
orthogonal  with  respect  to  the  inner  product 

i-2  rr 

(f,g}  = yo  f(x)g(x)dx 
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INTRODUCTION 

In  Section  5.2  we  found  conditions  that  guaranteed  the  diagonalizability  of  an  n x « 
matrix,  but  we  did  not  consider  what  class  or  classes  of  matrices  might  actually  satisfy 
those  conditions.  In  this  chapter  we  will  show  that  every  symmetric  matrix  is 
diagonalizable.  This  is  an  extremely  important  result  because  many  applications  utilize  it 
in  some  essential  way. 
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7.1  Orthogonal  Matrices 

In  this  section  we  will  discuss  the  class  of  matrices  whose  inverses  can  be  obtained  by  transposition.  Such  matrices  occur  in  a variety  of 
applications  and  arise  as  well  as  transition  matrices  when  one  orthonormal  basis  is  changed  to  another. 


Orthogonal  Matrices 

We  begin  with  the  following  definition. 

r n 


DEFINITION  1 

A square  matrix  A is  said  to  be  orthogonal  if  its  transpose  is  the  same  as  its  inverse,  that  is,  if 
or,  equivalently,  if 


a^=at 


aat=ata=i 


(1) 


L 


Recall  from  Theorem  1.6.3  that  if  either  product  in  1 holds,  then 
so  does  the  other.  Thus,  A is  orthogonal  if  either  J{A  ^ = / or 

ata=l 


EXAMPLE  1 A 3 x 3 Orthogonal  Matrix 


The  matrix 


is  orthogonal  since 


A = 


3 2. 
7 7 

2 2 

7 7 

2 6 

7 7 


6 

7 

2 

7 

3 

7 


2 

7 

1 

7 

6 

7 


6 

7 

2 

7 

3 

7 


1 0 0 
0 1 0 
0 0 1 


J 


EXAMPLE  2 Rotation  and  Reflection  Matrices  are  Orthogonal 


Recall  from  Table  Table  5 of  Section  4.9  that  the  standard  matrix  for  the  counterclockwise  rotation  of  R1  through  an  angle  0 is 


A = 


cos  9 
sin  9 


—sin  6 
cos  0 


This  matrix  is  orthogonal  for  all  choices  of  0 since 


cos  9 sin# 

cos  9 

—sin  9 

'1 

O' 

—sin  9 cos  9 

sin  9 

cos  9 

_0 

1_ 

We  leave  it  for  you  to  verify  that  the  reflection  matrices  in  Tables  Table  1 and  Table  2 and  the  rotation  matrices  in  Table  Table  6 of 
Section  4.9  are  all  orthogonal. 


Observe  that  for  the  orthogonal  matrices  in  Example  1 and  Example  2,  both  the  row  vectors  and  the  column  vectors  form  orthonormal  sets  with 
respect  to  the  Euclidean  inner  product.  This  is  a consequence  of  the  following  theorem. 


THEOREM  7.1.1 

The  following  are  equivalent  for  an  n x n matrix  A. 

(a)  A is  orthogonal 

(b)  The  row  vectors  of  A form  an  orthonormal  set  in  Rn  with  the  Euclidean  inner  product. 

(c)  The  column  vectors  of  A form  an  orthonormal  set  in  Rn  with  the  Euclidean  inner  product. 

We  will  prove  the  equivalence  of  (a)  and  ( b ) and  leave  the  equivalence  of  (a)  and  (c)  as  an  exercise. 

(a)  « (b)  The  entry  in  the  zth  row  and  /th  column  of  the  matrix  product  AA  ^ is  the  dot  product  of  the  z'th  row  vector  of  A and  the  /th  column 
vector  ofAT  (see  Formula  5 of  Section  1 .3).  But  except  for  a difference  in  form,  the  y'th  column  vector  of  A ^ is  the  y'th  row  vector  of  A.  Thus,  if  the 
row  vectors  of  A are  r\ , r2, .. rM,  then  the  matrix  product  AA  ^ can  be  expressed  as 


rl  ' 

1 ri 

rl 

1 r2  - 

- rl 

aat= 

r 2 ' 

’ ri 

r2  ' 

' r2  - 

to 

•-« 

a 

•ri 

' r2  - 

-- 

[see  Formula  28  of  Section  3.2].  Thus,  it  follows  that  AA  T = 1 if  and  only  if 

rl  Ti=r2  *r2  = ...  = r„ -r„=  1 


and 


r2  • r j = 0 when  i * j 

which  are  true  if  and  only  if  {rj,  r2 rM}  is  an  orthonormal  set  in  Rn. 


WARNING 

Note  that  an  orthogonal  matrix  is  one  with  orthonormal  rows  and  columns — not  simply  orthogonal  rows  and  columns. 


The  following  theorem  lists  three  more  fundamental  properties  of  orthogonal  matrices.  The  proofs  are  all  straightforward  and  are  left  as  exercises. 


THEOREM  7.1.2 

(a)  The  inverse  of  an  orthogonal  matrix  is  orthogonal. 

(b)  A product  of  orthogonal  matrices  is  orthogonal. 

(c)  If  A is  orthogonal,  then  det(j4)  = 1 or  det(-i4)  = — 1. 


EXAMPLE  3 det(>4)  = 


±1  for  an  Orthogonal  Matrix  A 


◄ 


The  matrix 


is  orthogonal  since  its  row  (and  column)  vectors  form  orthonormal  sets  in  r}  with  the  Euclidean  inner  product.  We  leave  it  for  you 
to  verify  that  det(-4)  = 1 and  that  interchanging  the  rows  produces  an  orthogonal  matrix  whose  determinant  is  — 


Orthogonal  Matrices  as  Linear  Operators 

We  observed  in  Example  2 that  the  standard  matrices  for  the  basic  reflection  and  rotation  operators  on  p'1  and  p}  are  orthogonal.  The  next  theorem 
will  explain  why  this  is  so. 


THEOREM  7.1.3 

If  A is  an  ^ x n matrix,  then  the  following  are  equivalent. 

(a)  A is  orthogonal. 

(b)  ||^||  = ||^||  for  all  x in  Rn. 

(c)  Ax  ■ Ay  = x • y for  all  x and  y in  Rn. 

We  will  prove  the  sequence  of  implications  (a)  =>  ( b ) =>  (c)  =>  (a). 

Assume  that  A is  orthogonal,  so  that  A^ A = /•  It  follows  from  Formula  26  of  Section  3.2  that 

|Mx||  = (Ax  • ,4x) 1/2  = (x  • ATAx'jU 2 = (x  • x) 1/2  = ||x|| 

Assume  that  ||Ax||  = ||x||  for  all  x in  Rn.  From  Theorem  3.2.7  we  have 

Ax- Ay  = l||^x  + J4y||2-I||^x-J4y||2=lp(x  + yJ||2-Ip(x-yJ||2 

= |||x  + y||2-i||x-y||2=x-y 

Assume  that  ■ Ay  = x • y for  all  x and  y in  Rn.  It  follows  from  Formula  26  of  Section  3.2  that 

x • y = x • ^4  T Ay 

which  can  be  rewritten  as  x ■ ^4  ^ Ay  — y j = 0 or  as 

x*  ^r^-/)y  = 0 

Since  this  equation  holds  for  all  x in  Rn,  it  holds  in  particular  if  x = ^4  ^ A — /Jy?  so 

[ATA-iy  = 0 

Thus,  it  follows  from  the  positivity  axiom  for  inner  products  that 

(,4rJ4-/)y  = 0 

Since  this  equation  is  satisfied  by  every  vector  y in  Rn,  it  must  be  that  A1  A — I is  the  zero  matrix  (why?)  and  hence  that  A 1 A = /•  Thus,  A is 
orthogonal. 


Theorem  7.1.3  has  a useful  geometric  interpretation  when  considered  from  the  viewpoint  of  matrix  transformations:  If  A is  an  orthogonal  matrix 
and  Tj^  Rn  Rn  Is  multiplication  by  A,  then  we  will  call  Tj\  an  orthogonal  operator  on  Rn.  It  follows  from  parts  (a)  and  ( b ) of  Theorem  7.1.3 
that  the  orthogonal  operators  on  Rn  are  precisely  those  operators  that  leave  the  lengths  of  all  vectors  unchanged.  This  explains  why,  in  Example  2, 
we  found  the  standard  matrices  for  the  basic  reflections  and  rotations  of  R^  and  R^  to  be  orthogonal. 

Parts  (a)  and  (c)  of  Theorem  7.1.3  imply  that  orthogonal 
operators  leave  the  angle  between  two  vectors  unchanged.  Why? 


Change  of  Orthonormal  Basis 


Orthonormal  bases  for  inner  product  spaces  are  convenient  because,  as  the  following  theorem  shows,  many  familiar  formulas  hold  for  such  bases. 
We  leave  the  proof  as  an  exercise. 


THEOREM  7.1.4 


If  S is  an  orthonormal  basis  for  an  ^-dimensional  inner  product  space  V,  and  if 

(u) £-=(«!,  U2 u„)  and  (v)5.=  (vi,v2 v„) 


then: 

(a>  Ml  = l/n?+«2+  ■ ■ ■ + u% 

(b)  rf(n.v)=if(B1—  vl)2  + (U2-v2)2+  • • • + K-V*)2 

(c)  (u,  v}  = uivi  + «2v2  + • • • +u„v„ 


Note  that  the  three  parts  of  Theorem  7.1.4  can  be  expressed  as 

INI  = 1100  dl  <*(*.  v)  = d((u),y,  (v)S)  (U,  V}  = {(u)S,  (v)j} 

where  the  norm,  distance,  and  inner  product  on  the  left  sides  are  relative  to  the  inner  product  on  V and  on  the  right  sides  are  relative  to  the 
Euclidean  inner  product  on  Rn. 


Transitions  between  orthonormal  bases  for  an  inner  product  space  are  of  special  importance  in  geometry  and  various  applications.  The  following 
theorem,  whose  proof  is  deferred  to  the  end  of  this  section,  is  concerned  with  transitions  of  this  type. 


THEOREM  7.1.5 

Let  Kbe  a finite-dimensional  inner  product  space.  If  P is  the  transition  matrix  from  one  orthonormal  basis  for  V to  another  orthonormal 
basis  for  V,  then  P is  an  orthogonal  matrix. 


EXAMPLE  4 Rotation  of  Axes  in  2-Space 


In  many  problems  a rectangular  xy-coordinate  system  is  given,  and  a new  x ' y ' -coordinate  system  is  obtained  by  rotating  the 
xy- system  counterclockwise  about  the  origin  through  an  angle  0.  When  this  is  done,  each  point  Q in  the  plane  has  two  sets  of 
coordinates — coordinates  (*,  y)  relative  to  the  xy-system  and  coordinates  (*  > y ) relative  to  the  x^-system  (Figure  7.1.1a). 


A u2 


t*  1 


ay 


1 

1 

1 

1 

•♦f 

1 

Jt 

*0  * 

(«) 


(*) 


Figure  7.1.1 


(c) 


id) 


By  introducing  unit  vectors  uj  and  U2  along  the  positive  x-  andy-axes  and  unit  vectors  uj  and  u-,  along  the  positive  xr-  and  y^-axes, 
we  can  regard  this  rotation  as  a change  from  an  old  basis  B = (uj , U2 } to  a new  basis  — |ui  > u2  } (Figure  7. 1 . lb).  Thus,  the  nev 
coordinates  (x*>  y*)  and  the  old  coordinates  (x,  y)  of  a point  Q will  be  related  by 


(2) 


where  P is  the  transition  from  B'  to  B.  To  find  P we  must  determine  the  coordinate  matrices  of  the  new  basis  vectors  Uj  and  u-, 
relative  to  the  old  basis.  As  indicated  in  Figure  7.1.1c,  the  components  of  u'j  in  the  old  basis  are  cos  0 and  sin  0,  so 

r / -1  [" COS  0 1 

Similarly,  from  Figure  7. 1 . Id,  we  see  that  the  components  of  u7  in  the  old  basis  are  cos  (0  4-  x / 2)  = — sin  0 and 
sin(0  + 7T  / 2)  = cos  0,  so 

—sin  0 


cos  0 


Thus  the  transition  matrix  from  B'  to  B is 


P = 


cos0  —sin  0 
sin  0 cos  0 


Observe  that  P is  an  orthogonal  matrix,  as  expected,  since  B and  B'  are  orthonormal  bases.  Thus 

cos  0 sin# 

—sin  0 cos0 


p~1=pt= 


so  2 yields 


cos  0 sin  ( 
—sin  0 cos  I 


X] 


or,  equivalently, 

x9  = xcos0+.ysin0 
yr  = —x  sin  0+7  cos  0 

These  are  sometimes  called  the  rotation  equations  for  £2. 


(3) 


(4) 


(5) 


EXAMPLE  5 Rotation  of  Axes  in  2-Space 

Use  form  4 of  the  rotation  equations  for  to  find  the  new  coordinates  of  the  point  Q( 2,  1)  if  the  coordinate  axes  of  a rectangular 
coordinate  system  are  rotated  through  an  angle  of  9 = tt  / 4- 


Solution  Since 


the  equation  in  4 becomes 


sin-j  = cos-7  = -\= 
4 4 {2 


{2  {2 
1 1 
\[2  {2 


Thus,  if  the  old  coordinates  of  a point  Q are  y)  = (2,  — 1),  then 


y 


so  the  new  coordinates  of  Q are  y9 1 = ‘ — 


1 

1/2  {2 

2' 

f2 

1_  J_ 

{2  {2 

_-l_ 

3 

~f2_ 

Observe  that  the  coefficient  matrix  in  4 is  the  same  as  the  standard  matrix  for  the  linear  operator  that  rotates  the  vectors  of  through 
the  angle  ~^0  (see  margin  note  for  Table  5 of  Section  4.9).  This  is  to  be  expected  since  rotating  the  coordinate  axes  through  the  angle  0 with  the 
vectors  of  p}  kept  fixed  has  the  same  effect  as  rotating  the  vectors  in  p}  through  the  angle  —0  with  the  axes  kept  fixed. 


EXAMPLE  6 Application  to  Rotation  of  Axes  in  3-Space 


Suppose  that  a rectangular  xyz-coordinate  system  is  rotated  around  its  z-axis  counterclockwise  (looking  down  the  positive  z-axis) 
through  an  angle  0 (Figure  7.1.2).  If  we  introduce  unit  vectors  ui , 112,  and  113  along  the  positive  x-,  y-,  and  z-axes  and  unit  vectors  uj , 
u^,  and  U3  along  the  positive  xf~,  y'~,  and  z'-axes,  we  can  regard  the  rotation  as  a change  from  the  old  basis  B = {u\,  112, 113}  to  the 
new  basis  B*  = =j  uj , u^,  U3  j> . in  light  of  Example  4,  it  should  be  evident  that 


Moreover,  since  113  extends  1 unit  up  the  positive  z'-axis, 


cos  6 
sin  6 
0 


and  [i4]J 


[»3h  = 


—sin  9 
cos  9 
0 


y 


Figure  7.1.2 


It  follows  that  the  transition  matrix  from  B’  to  B is 


P = 


cos  9 
sin  9 
0 


—sin  9 0 
cos  9 0 
0 1 


and  the  transition  matrix  from  B to  B'  is 


P"1 


cos  9 sin  9 0 
—sin  9 cos  9 0 
0 0 1 


(verify).  Thus,  the  new  coordinates  ft*,  y*  > z*)  of  a point  Q can  be  computed  from  its  old  coordinates  (*,  yr  z)  by 


x 

y 


f 

f 


z 


f 


cos  9 sin#  0 
—sin  9 cos  9 0 
0 0 1 


x 

y 

z 


OPTIONAL 

We  conclude  this  section  with  an  optional  proof  of  Theorem  7.1.5. 

Assume  that  V is  an  ^-dimensional  inner  product  space  and  that  P is  the  transition  matrix  from  an  orthonormal  basis 
B'  to  an  orthonormal  basis  B.  We  will  denote  the  norm  relative  to  the  inner  product  on  Vby  the  symbol  ||  ||  y to  distinguish  it  from  the  norm 
relative  to  the  Euclidean  inner  product  on  Rn,  which  we  will  denote  by  ||  ||. 


Recall  that  (u)  ^ denotes  a coordinate  vector  expressed  in 
comma-delimited  form  whereas  [u]  £ denotes  a coordinate  vector 
expressed  in  column  form. 

To  prove  that  P is  orthogonal,  we  will  use  Theorem  7.1.3  and  show  that  ||Px||  = ||x||  for  every  vector  x in  Rn.  As  a first  step  in  this  direction, 
recall  from  Theorem  7.1.4a  that  for  any  orthonormal  basis  for  V the  norm  of  any  vector  u in  Kis  the  same  as  the  norm  of  its  coordinate  vector  with 
respect  to  the  Euclidean  inner  product,  that  is 


or 


l|u||r=ll[u]s'll  = ll[u]sll 


INI  r=  II  [u]*'ll  = Hauls'll 


(6) 


Now  let  x be  any  vector  in  Rn , and  let  u be  the  vector  in  V whose  coordinate  vector  with  respect  to  the  basis  B'  is  x;  that  is,  [u]  g*  = x.  Thus,  from 

6, 

||u||  = ||x||  = ||^|| 


which  proves  that  P is  orthogonal. 


Concept  Review 

Orthogonal  matrix 
Orthogonal  operator 
Properties  of  orthogonal  matrices. 

Geometric  properties  of  an  orthogonal  operator 

Properties  of  transition  matrices  from  one  orthonormal  basis  to  another. 

Skills 

Be  able  to  identify  an  orthogonal  matrix. 

Know  the  possible  values  for  the  determinant  of  an  orthogonal  matrix. 
Find  the  new  coordinates  of  a point  resulting  from  a rotation  of  axes. 


Exercise  Set  7.1 

(a)  Show  that  the  matrix 


4 

0 

3 

5 

5 

9 

4 

12 

25 

5 

25 

12 

3 

16 

25 

5 

25 

is  orthogonal  in  three  ways:  by  calculating  A ^4,  by  using  part  (b)  of  Theorem  7.1.1,  and  by  using  part  (c)  of  Theorem  7.1.1. 
(b)  Find  the  inverse  of  the  matrix  A in  part  (a). 


Answer: 


(b) 


4 

5 

0 

3 

5 


9 

12 

25 

25 

4 

3 

5 

5 

12 

16 

25 

25 

(a)  Show  that  the  matrix 


A 


1 2 2 
3 3 3 

2 _2  1 

3 3 3 

2 _i  2 

3 3 3 


is  orthogonal. 


(b)  Let  X \P?  — ► be  multiplication  by  the  matrix  A in  part  (a).  Find  T(x)  for  the  vector  x = ( — 2,  3,  5) 

on  pp,  verify  that  ||T(x)  ||  = ||x||. 

3.  Determine  which  of  the  following  matrices  are  orthogonal.  For  those  that  are  orthogonal,  find  the  inverse. 


(d) 


(e) 


(f> 


(a) 

'1  O' 

.0 

(b) 

" 1 

1 

f2 

~f2 

1 

1 

f2 

f2 

(c) 

0 1 

1 

h 

1 0 

0 

0 0 

1 

f2 


1 1 1 

{2  /6  {3 

_2_  1 

Ve  {3 

1 1 1 

1/2/61/3 


0 


1 

2 

1 

2 

1 

2 

1 

2 

1 0 


1 

2 

1 

6 

1 

6 

5 

"6 

0 0 


• + -i' 


/5 


0 -j= 


f3 


0 1 


o d=  lo 


Answer: 


(a) 

(b) 

(d) 


1 0 
0 1 

{2  {i 

1_  J_ 

{2  {2. 


,J_  0 -L 

f2  f2 

J_ 2_  J_ 

1 _1 1_ 

{3  {3  {3 


. Using  the  Euclidean  inner  product 


(e) 


1 

2 

1 

2 

1 

2 

1 

2 


1 

2 

5 

6 

1 

6 

1 

6 


1 

2 

1 

6 

I 

6 

5 

6 


1 

2 

1 

6 

5 

6 

1 

6 


4.  Prove  that  if  A is  orthogonal,  then  ^ is  orthogonal. 

5.  Verify  that  the  reflection  matrices  in  Tables  Table  1 and  Table  2 of  Section  4.9  are  orthogonal. 

6.  Let  a rectangular  xfyf -coordinate  system  be  obtained  by  rotating  a rectangular  xy-coordinate  system  counterclockwise  through  the  angle 
0 = 3tt/4. 

(a)  Find  the  xry r -coordinates  of  the  point  whose  xy-coordinates  are  ( — 2,  6) . 

(b)  Find  the  xy-coordinates  of  the  point  whose  xryr -coordinates  are  (5,  2). 


7.  Repeat  Exercise  6 with  Q = n f 3- 


Answer: 


(a)  (-1  + 3/3,  3+ ^3) 

(b>(§-/3,  f/5+i) 

8.  Let  a rectangular  x*y*z' -coordinate  system  be  obtained  by  rotating  a rectangular  xyz-coordinate  system  counterclockwise  about  the  z-axis 
(looking  down  the  z-axis)  through  the  angle  0 = tt  / 4- 

(a)  Find  the  xryrzr -coordinates  of  the  point  whose  xyz-coordinates  are  ( — 1,  2,  5). 

(b)  Find  the  xyz-coordinates  of  the  point  whose  x'y'z' -coordinates  are  (1,  6,  — 3). 

9.  Repeat  Exercise  8 for  a rotation  of  0 = ^ / 3 counterclockwise  about  they-axis  (looking  along  the  positive  y-axis  toward  the  origin). 

Answer: 


10.  Repeat  Exercise  8 for  a rotation  of  0 = 3^  / 4 counterclockwise  about  the  x-axis  (looking  along  the  positive  x-axis  toward  the  origin). 

(a)  A rectangular  xryrzr -coordinate  system  is  obtained  by  rotating  an  xyz-coordinate  system  counterclockwise  about  they-axis  through  an 
angle  0 (looking  along  the  positive  y-axis  toward  the  origin).  Find  a matrix^  such  that 

where  (x9  yr  z)  and  y* z ')  are  the  coordinates  of  the  same  point  in  the  xyz-  and  x'yV-systems,  respectively. 

(b)  Repeat  part  (a)  for  a rotation  about  the  x-axis. 


Answer: 


(a) 

cos  9 0 —sin  0 

A = 

0 1 0 

sin#  0 cos  0 

(b) 

0 

0 

A = 

0 cos  9 sin  6 

0 —sin  0 cos  9 

12.  A rectangular  x,lynz! '-coordinate  system  is  obtained  by  first  rotating  a rectangular  xyz-coordinate  system  60°  counterclockwise  about  the 
z-axis  (looking  down  the  positive  z-axis)  to  obtain  an  x'yrzf -coordinate  system,  and  then  rotating  the  xryrzr -coordinate  system  45° 


counterclockwise  about  the  yr-axis  (looking  along  the  positive  y^-axis  toward  the  origin).  Find  a matrix^  such  that 


J7  " 

X 

'x~ 

/' 

= A 

y 

// 

z 

z 

where  (x,y,z)  and  (* 99 , y 99 , ) are  the  xyz-  and  x " y nzn  - coordinates  of  the  same  point. 

13.  What  conditions  must  a and  b satisfy  for  the  matrix 

aA-b  b — a 
a — b b A-a 

to  be  orthogonal? 


Answer: 


A2  - 1 
* +b  ~2 

14.  Prove  that  a 2 x 2 orthogonal  matrix  A has  only  one  of  two  possible  forms: 

cos  # sin  # 
sin#  —cos# 

where  0 < # < 2tt.  [Hint:  Start  with  a general  2x2  matrix  A = , and  use  the  fact  that  the  column  vectors  form  an  orthonormal  set  in  £2.] 

(a)  Use  the  result  in  Exercise  14  to  prove  that  multiplication  by  a 2 x 2 orthogonal  matrix  is  either  a reflection  or  a reflection  followed  by  a 
rotation  about  the  x-axis. 

(b)  Prove  that  multiplication  by  ^4is  a rotation  if  det(^4)  = 1 and  that  a reflection  followed  by  a rotation  if  det(^4)  = — 1 . 


A = 


cos#  —sin# 
sin  # cos  # 


or  A = I 


16.  Use  the  result  in  Exercise  15  to  determine  whether  multiplication  by  A is  a reflection  or  a reflection  followed  by  a rotation  about  the  x-axis. 
Find  the  angle  of  rotation  in  either  case. 


(a) 


A = 


f2 

f2 


f2 

f2 


(b) 


2 


1 

2 


17.  Find  a , b , and  c for  which  the  matrix 

" 72  ~J2 

h J_  J_ 

fe  ft 

c -r  7= 

/3  /3 

is  orthogonal.  Are  the  values  of  a , b , and  c unique?  Explain. 


Answer: 


a ?. 2 1 

The  only  possibilities  are  “ — 0 ~ i—>c—  r~ 


ft  f3 


ra  = 0,  b 


18.  The  result  in  Exercise  15  has  an  analog  for  3 x 3 orthogonal  matrices:  It  can  be  proved  that  multiplication  by  a 3 x 3 orthogonal  matrix  A is  a 
rotation  about  some  axis  if  det(.d)  = 1 and  is  a rotation  about  some  axis  followed  by  a reflection  about  some  coordinate  plane  if  det(^4)  = — 1 
. Determine  whether  multiplication  by  A is  a rotation  or  a rotation  followed  by  a reflection. 


(a) 


A = 


1 1 

7 7 

6 3 

7 7 
2 6 
7 7 


6 

7 

2 

7 

3 

7 


(b) 


2 3 

7 7 


6 2 

7 7 

Use  the  fact  stated  in  Exercise  18  and  part  (6)  of  Theorem  7.1.2  to  show  that  a composition  of  rotations  can  always  be  accomplished  by  a single 
rotation  about  some  appropriate  axis. 

Prove  the  equivalence  of  statements  (a)  and  (c)  in  Theorem  7.1.1. 

A linear  operator  on  R 2 is  called  rigid  if  it  does  not  change  the  lengths  of  vectors,  and  it  is  called  angle  preserving  if  it  does  not  change  the 
angle  between  nonzero  vectors. 

(a)  Name  two  different  types  of  linear  operators  that  are  rigid. 

(b)  Name  two  different  types  of  linear  operators  that  are  angle  preserving. 

(c)  Are  there  any  linear  operators  on  R1  that  are  rigid  and  not  angle  preserving?  Angle  preserving  and  not  rigid?  Justify  your  answer. 

Answer: 

(a)  Rotations  about  the  origin,  reflections  about  any  line  through  the  origin,  and  any  combination  of  these 

(b)  Rotation  about  the  origin,  dilations,  contractions,  reflections  about  lines  through  the  origin,  and  combinations  of  these 

(c)  No;  dilations  and  contractions 

True-False  Exercises 


19. 

20. 
21. 


In  parts  (a)-(h)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 


(a)  1 

The  matrix  0 
0 


0 

1 

0 


is  orthogonal. 


Answer: 

False 

^ The  matrix 


-2 

1 


is  orthogonal. 


Answer: 

False 

(c)  An^x«  matrix  A is  orthogonal  if  A 7 A = L 
Answer: 

False 

(d)  A square  matrix  whose  columns  form  an  orthogonal  set  is  orthogonal. 
Answer: 

False 

(e)  Every  orthogonal  matrix  is  invertible. 

Answer: 

True 

(f)  If  A is  an  orthogonal  matrix,  then  is  orthogonal  and  (det  A) 2 = 1 . 


Answer: 

True 


(g)  Every  eigenvalue  of  an  orthogonal  matrix  has  absolute  value  1 . 


Answer: 


True 

(h)  If  A is  a square  matrix  and  \\Au\\  = 1 for  all  unit  vectors  u,  then  A is  orthogonal. 
Answer: 

True 
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7.2  Orthogonal  Diagonalization 

In  this  section  we  will  be  concerned  with  the  problem  of  diagonalizing  a symmetric  matrix  A.  As  we  will  see,  this  problem  is 
closely  related  to  that  of  finding  an  orthonormal  basis  for  Rn  that  consists  of  eigenvectors  of  A.  Problems  of  this  type  are 
important  because  many  of  the  matrices  that  arise  in  applications  are  symmetric. 


The  Orthogonal  Diagonalization  Problem 

In  Definition  1 of  Section  5.2  we  defined  two  square  matrices,  A and  B , to  be  similar  if  there  is  an  invertible  matrix  P such 
that  p ~^AP  = B-  In  this  section  we  will  be  concerned  with  the  special  case  in  which  it  is  possible  to  find  an  orthogonal 
matrix  P for  which  this  relationship  holds. 

We  begin  with  the  following  definition. 


DEFINITION  1 

If  A and  B are  square  matrices,  then  we  say  that  A and  B are  orthogonally  similar  if  there  is  an  orthogonal  matrix  P 
such  that  p T AP  = B- 


J 


If  A is  orthogonally  similar  to  some  diagonal  matrix,  say 

PrAP  = D 

then  we  say  that  A is  orthogonally  diagonalizable  and  that  P orthogonally  diagonalizes  A. 

Our  first  goal  in  this  section  is  to  determine  what  conditions  a matrix  must  satisfy  to  be  orthogonally  diagonalizable.  As  a 
first  step,  observe  that  there  is  no  hope  of  orthogonally  diagonalizing  a matrix  that  is  not  symmetric.  To  see  why  this  is  so, 
suppose  that 

PtAP  = D (1) 

where  P is  an  orthogonal  matrix  and  D is  a diagonal  matrix.  Multiplying  the  left  side  of  1 by  P,  the  right  side  by  p * , and  then 
using  the  fact  that  pp  T = p Tp  = /,  we  can  rewrite  this  equation  as 

A = PDPT  (2) 

Now  transposing  both  sides  of  this  equation  and  using  the  fact  that  a diagonal  matrix  is  the  same  as  its  transpose  we  obtain 

at=  (pdpt)jT  = (pt)jTdtpt=pdpt=a 

so  A must  be  symmetric. 


Conditions  for  Orthogonal  Diagonalizability 


The  following  theorem  shows  that  every  symmetric  matrix  is,  in  fact,  orthogonally  diagonalizable.  In  this  theorem,  and  for 
the  remainder  of  this  section,  orthogonal  will  mean  orthogonal  with  respect  to  the  Euclidean  inner  product  on  Rn. 


THEOREM  7.2.1 


If  A is  an  n x n matrix,  then  the  following  are  equivalent. 

(a)  A is  orthogonally  diagonalizable. 

(b)  A has  an  orthonormal  set  of  n eigenvectors. 

(c)  A is  symmetric. 


Proof 


Since  A is  orthogonally  diagonalizable,  there  is  an  orthogonal  matrix  P such  that  p 1 J\p  is  diagonal.  As  shown  in 
the  proof  of  Theorem  5.2.1,  the  n column  vectors  of  P are  eigenvectors  of^4.  Since  P is  orthogonal,  these  column  vectors  are 
orthonormal,  so  A has  n orthonormal  eigenvectors. 

Assume  that  A has  an  orthonormal  set  of  n eigenvectors  {pi,  P2>  , P n)  • As  shown  in  the  proof  of  Theorem  5.2.1, 

the  matrix  P with  these  eigenvectors  as  columns  diagonalizes  A.  Since  these  eigenvectors  are  orthonormal,  P is  orthogonal 
and  thus  orthogonally  diagonalizes  A. 


In  the  proof  that  (< a ) =>  {b)  we  showed  that  an  orthogonally  diagonalizable  ^ x n matrix  A is  orthogonally 
diagonalized  by  an  n x n matrix  P whose  columns  form  an  orthonormal  set  of  eigenvectors  of  A.  Let  D be  the  diagonal 
matrix 

d=ptap 

from  which  it  follows  that 

A = PDPr 


Thus, 


at=  (pdpt)jT =pdtpt=pdpt = a 


which  shows  that  A is  symmetric. 


The  proof  of  this  part  is  beyond  the  scope  of  this  text  and  will  be  omitted. 


Properties  of  Symmetric  Matrices 

Our  next  goal  is  to  devise  a procedure  for  orthogonally  diagonalizing  a symmetric  matrix,  but  before  we  can  do  so,  we  need 
the  following  critical  theorem  about  eigenvalues  and  eigenvectors  of  symmetric  matrices. 


THEOREM  7.2.2 

If  A is  a symmetric  matrix,  then: 

(a)  The  eigenvalues  ofv4  are  all  real  numbers. 

(b)  Eigenvectors  from  different  eigenspaces  are  orthogonal. 


Part  (a),  which  requires  results  about  complex  vector  spaces,  will  be  discussed  in  Section  7.5. 


Proof  (b)  Let  v\  and  *2  be  eigenvectors  corresponding  to  distinct  eigenvalues  X\  and  A2  of  the  matrix  A.  We  want  to  show 
that  v\  • V2  = 0.  Our  proof  of  this  involves  the  trick  of  starting  with  the  expression  Av\  • V2  - It  follows  from  Formula  26  of 
Section  3.2  and  the  symmetry  of  A that 


Av\  • v2  = vi  • ATV2  = vi  • Av2 

But  vi  is  an  eigenvector  of^4  corresponding  to  and  V2  is  an  eigenvector  ofv4  corresponding  to  A2,  so  3 yields  the 
relationship 

Aivi  • V2  = vi  • A2V2 


which  can  be  rewritten  as 


(3) 


(Ai -A2)(vi  • v2)  = (4) 

But  Ai  — A2  * 0,  since  X\  and  A2  were  assumed  distinct.  Thus,  it  follows  from  4 that  v\  • V2  = 0. 


Theorem  7.2.2  yields  the  following  procedure  for  orthogonally  diagonalizing  a symmetric  matrix. 


n 


Orthogonally  Diagonalizing  an  n * n Symmetric  Matrix 

Step  1 Find  a basis  for  each  eigenspace  of  A. 

Step  2 Apply  the  Gram-Schmidt  process  to  each  of  these  bases  to  obtain  an  orthonormal  basis  for  each  eigenspace. 

Step  3 Form  the  matrix  P whose  columns  are  the  vectors  constructed  in  Step  2.  This  matrix  will  orthogonally 
diagonalize  A,  and  the  eigenvalues  on  the  diagonal  of  p _ pT j^p  will  be  in  the  same  order  as  their  corresponding 
eigenvectors  in  P. 


L J 

The  justification  of  this  procedure  should  be  clear:  Theorem  7.2.2  ensures  that  eigenvectors  from  different 
eigenspaces  are  orthogonal,  and  applying  the  Gram-Schmidt  process  ensures  that  the  eigenvectors  within  the  same 
eigenspace  are  orthonormal.  It  follows  that  the  entire  set  of  eigenvectors  obtained  by  this  procedure  will  be  orthonormal. 


EXAMPLE  1 Orthogonally  Diagonalizing  a Symmetric  Matrix 


Find  an  orthogonal  matrix  P that  diagonalizes 


A = 


4 

2 

2 


2 2 
4 2 
2 4 


We  leave  it  for  you  to  verify  that  the  characteristic  equation  of  A is 


A — 4 


det(A/  — A)  = det 


-2 

-2 


-2 

A — 4 

-2 


-2 

-2 

A — 4 


(A  — 2) 2 (A  — 8)  = 0 


Thus,  the  distinct  eigenvalues  of  A are  A = 2 and  A = 8-  By  the  method  used  in  Example  7 of  Section  5. 1,  it 
can  be  shown  that 


U1  = 

'-f 

1 

and  112  = 

-r 

0 

0 

1 

(5) 


form  a basis  for  the  eigenspace  corresponding  to  \ = 2-  Applying  the  Gram-Schmidt  process  to  {\i\ , U2} 
yields  the  following  orthonormal  eigenvectors  (verify): 


1 

1 

~ft 

'ft 

1 

-ft 

2 

vi  = 

1 

f2 

and  V2  = 

0 

ft_ 

(6) 


The  eigenspace  corresponding  to  A = 8 has 


u3  = 


as  a basis.  Applying  the  Gram-Schmidt  process  to  {113}  (i.e.,  normalizing  u3)  yields 

1 

ft 
1 

P 

1 

ft 


V3: 


Finally,  using  v\9  v3,  and  v3  as  column  vectors,  we  obtain 

1 

'ft  " 

f2 


P = 


ft  ft 

ft  ft 
2_  _1 
ft  ft 


0 -4=  -1= 


which  orthogonally  diagonalizes  A.  As  a check,  we  leave  it  for  you  to  confirm  that 


ptap= 


L_l  j_  0 

1 1_ 

ft  ft 

ft 

ft  ft 

1_ L _2_ 

'4  2 2' 

J_ 1_  J_ 

ft  ft  ft 

2 4 2 
2 2 4 

ft 

ft  ft 

1 1 1 
ft  ft  ft 

0 -2-  -L 

ft  ft\ 

2 0 0 
0 2 0 
0 0 8 


Spectral  Decomposition 


If  A is  a symmetric  matrix  that  is  orthogonally  diagonalized  by 


P=[ui  u2  - u„] 

andifAj,  A2,  A„  are  the  eigenvalues  of  A corresponding  to  the  unit  eigenvectors  \i\,  112,  u„,  then  we  know  that 

q _ pT where  D is  a diagonal  matrix  with  the  eigenvalues  in  the  diagonal  positions.  It  follows  from  this  that  the  matrix 
A can  be  expressed  as 


Ai 

0 . 

..  0 

T 

U1 

A = PDPT= 

ui  u2  . 

u„ 

0 

a2  . 

..  0 

T 

u2 

0 

0 . 

1 

T~ 

U1 

Aiui  A2u2 

- A„u„ 

T 

u2 

T 

u„ 

Multiplying  out,  we  obtain  the  formula 

^4  = Aiuiuf  + A211211J  +---  + A„u„u£  (7) 


which  is  called  a spectral  decomposition  of  A. 

Note  that  in  each  term  of  the  spectral  decomposition  of  A has  the  form  \uu^,  where  u is  a unit  eigenvector  of  A in  column 
form,  and  A is  an  eigenvalue  of^4  corresponding  to  u.  Since  u has  size  ^ x 1 , it  follows  that  the  product  UUT  has  size  « x «•  It 
can  be  proved  (though  we  will  not  do  it)  that  UUT  is  the  standard  matrix  for  the  orthogonal  projection  of  Rn  on  the  subspace 
spanned  by  the  vector  u.  Accepting  this  to  be  so,  the  spectral  decomposition  of  A tells  that  the  image  of  a vector  x under 
multiplication  by  a symmetric  matrix  A can  be  obtained  by  projecting  x orthogonally  on  the  lines  (one-dimensional 
subspaces)  determined  by  the  eigenvectors  of  A,  then  scaling  those  projections  by  the  eigenvalues,  and  then  adding  the  scaled 
projections.  Here  is  an  example. 

EXAMPLE  2 A Geometric  Interpretation  of  a Spectral  Decomposition 


The  matrix 


has  eigenvalues  X\=  — 3 and  A2  = 2 with  corresponding  eigenvectors 


xi  = 


and  X2  = 


2 

1 


(verify).  Normalizing  these  basis  vectors  yields 


u,  X1 

1 

' 2 

& 

01  - mi  - 

2 

"d  "2_  tall  ' 

1 

so  a spectral  decomposition  of  A is 


(8) 


r 1 2 

T T 

1 

ft 

1 

2 " 

' 2 
ft 

'2  1 

[2  -2 

= Aiujuj  4-  A2112U2  = ( - 3) 

2 

ft 

~ft_ 

+ (2) 

1 

ft_ 

ft  ft_ 

= (-3) 


1 _2 
5 5 

2 4 
'5  5 


'4  2' 

+ (2) 

5 5 
2 1 

5 5 

where,  as  noted  above,  the  2 x 2 matrices  on  the  right  side  of  8 are  the  standard  matrices  for  the  orthogonal 
projections  onto  the  eigenspaces  corresponding  to  Aj  = — 3 and  A2  = 2,  respectively. 

Now  let  us  see  what  this  spectral  decomposition  tells  us  about  the  image  of  the  vector  x = (1,  1)  under 
multiplication  by  A.  Writing  x in  column  form,  it  follows  that 


Ax.= 


"1  2' 

T 

■3' 

_2  — 2_ 

_1_ 

_0_ 

(9) 


and  from  8 that 


Ax  = 


'1  2 

T 

.2  -2. 

l 

= (-3) 


'4  2' 

r r 

5 5 

[i_ 

+ (2) 

2 1 

5 5 

(-3) 

1 ' 

5 

2 

+ (2) 

'6' 

5 

3 

5 

5 

L 

J 

L 

3' 

'12' 

5 

5 

3 

6 

6 

_0_ 

5 

5 

(10) 


Formulas  9 and  10  provide  two  different  ways  of  viewing  the  image  of  the  vector  (1,1)  under  multiplication  by 
A:  Formula  9 tells  us  directly  that  the  image  of  this  vector  is  (3,  0),  whereas  Formula  10  tells  us  that  this  image 
can  also  be  obtained  by  projecting  (1,1)  onto  the  eigenspaces  corresponding  to  Aj  = — 3 and  A2  = 2 to  obtain 

the  vectors  j — -j,  ^ j and  j j,  then  scaling  by  the  eigenvalues  to  obtain  — -j  j and  j-^S  j,  and  then 
adding  these  vectors  (see  Figure  7.2.1). 


Figure  7.2.1 


The  Nondiagonalizable  Case 


If  A is  an  n x n matrix  that  is  not  orthogonally  diagonalizable,  it  may  still  be  possible  to  achieve  considerable  simplification 
in  the  form  of  p ^ AP  by  choosing  the  orthogonal  matrix  P appropriately.  We  will  consider  two  theorems  (without  proof)  that 
illustrate  this.  The  first,  due  to  the  German  mathematician  Isaai  Schur,  states  that  every  square  matrix  A is  orthogonally 
similar  to  an  upper  triangular  matrix  that  has  the  eigenvalues  of  A on  the  main  diagonal. 


Schur's  Theorem 


If  A is  an  ^ x n matrix  with  real  entries  and  real  eigenvalues,  then  there  is  an  orthogonal  matrix  P such  that  p ^AP  is 
an  upper  triangular  matrix  of  the  form 


PrAP  = 


Ai 

X 

X 

X 

0 

X 

X 

0 

0 

a3  • 

• X 

0 

0 

0 • 

Am 

(11) 


in  which  Aj,  A2, A„  are  the  eigenvalues  of  the  matrix  A repeated  according  to  multiplicity. 


Issai  Schur  (1875-1941) 

The  life  of  the  German  mathematician  Issai  Schur  is  a sad  reminder  of  the  effect  that  Nazi  policies 
had  on  Jewish  intellectuals  during  the  1930s.  Schur  was  a brilliant  mathematician  and  a popular  lecturer  who 
attracted  many  students  and  researchers  to  the  University  of  Berlin,  where  he  worked  and  taught.  His  lectures 
sometimes  attracted  so  many  students  that  opera  glasses  were  needed  to  see  him  from  the  back  row.  Schur's  life 
became  increasingly  difficult  under  Nazi  rule,  and  in  April  of  1933  he  was  forced  to  “retire”  from  the  university 
under  a law  that  prohibited  non- Aryans  from  holding  “civil  service”  positions.  There  was  an  outcry  from  many  of  his 
students  and  colleagues  who  respected  and  liked  him,  but  it  did  not  stave  off  his  complete  dismissal  in  1935.  Schur, 
who  thought  of  himself  as  a loyal  German  never  understood  the  persecution  and  humiliation  he  received  at  Nazi 
hands.  He  left  Germany  for  Palestine  in  1939,  a broken  man.  Lacking  in  financial  resources,  he  had  to  sell  his 
beloved  mathematics  books  and  lived  in  poverty  until  his  death  in  1941. 

[Image:  Courtesy  Electronic  Publishing  Services,  Inc.,  New  York  City] 


It  is  common  to  denote  the  upper  triangular  matrix  in  1 1 by  S (for  Schur),  in  which  case  that  equation  can  be  rewritten  as 


(12) 


a=pspt 


which  is  called  a Schur  decomposition  of  A. 

The  next  theorem,  due  to  the  German  mathematician  and  engineer  Karl  Hessenberg  (1904-1959),  states  that  every  square 
matrix  with  real  entries  is  orthogonally  similar  to  a matrix  in  which  each  entry  below  the  first  subdiagonal  is  zero  (Figure 
7.2.2).  Such  a matrix  is  said  to  be  in  upper  Hessenberg  form. 

X X X X X 

X X X X X 

X X X X X 

X X X X X 

X X X X X 

First  subdiagonal 


Figure  7.2.2 


Hessenberg's  Theorem 


If  A is  an  « x n matrix,  then  there  is  an  orthogonal  matrix  P such  that  p ^ AP  is  a matrix  of  the  form 


x 

x 


ptap= 


0 


X 

X 

X 


. . . x x 

■ • • X X 

XX 


X 

X 

X 


0 0 ■ ■ ■ X X X 

0 0 ■ • ■ 0 xx 


Note  that  unlike  those  in  11,  the  diagonal  entries  in  13 
are  usually  not  the  eigenvalues  of  A. 


(13) 


It  is  common  to  denote  the  upper  Hessenberg  matrix  in  13  by  //(for  Hessenberg),  in  which  case  that  equation  can  be 
rewritten  as 


A = PHPr  (14) 

which  is  called  an  upper  Hessenberg  decomposition  of  A. 

In  many  numerical  algorithms  the  initial  matrix  is  first  converted  to  upper  Hessenberg  form  to  reduce  the  amount 
of  computation  in  subsequent  parts  of  the  algorithm.  Many  computer  packages  have  built-in  commands  for  finding  Schur  and 
Hessenberg  decompositions. 


Concept  Review 

Orthogonally  similar  matrices 


Orthogonally  diagonalizable  matrix 

Spectral  decomposition  (or  eigenvalue  decomposition) 

Schur  decomposition 

Subdiagonal 

Upper  Hessenburg  form 

Upper  Hessenburg  decomposition 

Skills 

Be  able  to  recognize  an  orthogonally  diagonalizable  matrix. 

Know  that  eigenvalues  of  symmetric  matrices  are  real  numbers. 

Know  that  for  a symmetric  matrix  eigenvectors  from  different  eigenspaces  are  orthogonal. 
Be  able  to  orthogonally  diagonalize  a symmetric  matrix. 

Be  able  to  find  the  spectral  decomposition  of  a symmetric  matrix. 

Know  the  statement  of  Schur’s  Theorem. 

Know  the  statement  of  Hessenburg' s Theorem. 


Exercise  Set  7.2 

1.  Find  the  characteristic  equation  of  the  given  symmetric  matrix,  and  then  by  inspection  determine  the  dimensions  of  the 


2 

-2 


eigenspaces. 

(a) 

"1 

2' 

_2 

4_ 

(b) 

1 

-4 

■4 

1 

2 

-2 

(c) 

"1 

1 

f 

1 

1 

1 

1 

1 

1 

(d) 

"4 

2 

2' 

2 

4 

2 

2 

2 

4 

(e) 

’4 

4 

0 

4 

4 

0 

0 

0 

0 

0 

0 

0 

(f) 

2 

-1 

•1 

2 

0 

0 

0 

0 

2 -1 

-1  2 


Answer: 


(a)  A2  — 5A  = 0:  A = 0:  one-dimensional;  A = 5:  one-dimensional 

(b)  A3  — 27A  — 54  = 0:  A = 6:  one-dimensional;  A = — 3;  two-dimensional 


(c)  A3  — 3A2  = 0:  A = 3:  one-dimensional;  A = 0:  two-dimensional 

(d)  A3  — 12A3  + 36A  — 32  = 0;  A = 2:  two-dimensional;  A = 8:  one-dimensional 

(e)  A4  — 8A3  = 0:  A = 0:  three-dimensional;  A = 8:  one-dimensional 

(f)  A4  — 8A3  4-  22A2  — 24A  + 9 = 0;  A = 1:  two-dimensional;  A = 3 : two-dimensional 

In  Exercises  2-9,  find  a matrix  P that  orthogonally  diagonalizes  A,  and  determine  P~^AP- 


5.  -2  0 -36" 

A=  0-3  0 

-36  0 -23 


Answer: 


6.  ri  i o' 

A=  110 
0 0 0 

7.  2 -i  -1" 

A=  -1  2 -1 

-1  -1  2 


8. 


A = 


3 10  0 
13  0 0 
0 0 0 0 
0 0 0 0 


9. 


A = 


-7  24 
24  7 

0 


0 0 

7 0 0 

0 -7  24 


0 0 24  7 


Answer: 


P = 


4 
'5 

3 

5 

0 

0 


3 
5 

4 

5 

0 

0 


0 

0 

4 
‘5 

3 

5 


P~lAP  = 


-25 

0 

0 

0 


0 

25 

0 

0 


0 

0 

-25 

0 


0 

0 

0 

25 


10.  Assuming  that  £ ^ Q,  find  a matrix  that  orthogonally  diagonalizes 


" a b 
b a 


11.  Prove  that  if  A is  any  m x n matrix,  then  £ ^4  has  an  orthonormal  set  of  n eigenvectors. 

12 • (a)  Show  that  if  v is  any  « x 1 matrix  and  / is  the  ^ x n identity  matrix,  then  / _ is  orthogonally  diagonalizable. 
(b)  Find  a matrix  P that  orthogonally  diagonalizes  / ^ if 

~1' 


v = 


o' 

T 

'O' 

1 

, 

0 

7 

1 

-1 

0 

1 

13.  Use  the  result  in  Exercise  19  of  Section  5.1  to  prove  Theorem  122a  for  2 x 2 symmetric  matrices. 

14.  Does  there  exist  a 3 x 3 symmetric  matrix  with  eigenvalues  Aj  = — 1,  A2  = 3,  A3  = 7 and  corresponding  eigenvectors 


If  so,  find  such  a matrix;  if  not,  explain  why  not. 

15.  Is  the  converse  of  Theorem  1 22b  true?  Explain. 

Answer: 

No 


16.  Find  the  spectral  decomposition  of  each  matrix. 


(a) 

(b) 

(c) 


3 1 
1 3 
6 

-2 

—3 

1 

2 


1] 

1 2 
-3  2 

2 0 


(d) 


—2 

0 


-36 


0 -36 
-3  0 

0 -23 


17.  Show  that  if  A is  a symmetric  orthogonal  matrix,  then  1 and  _1  are  the  only  possible  eigenvalues. 

(a)  Find  a 3 x 3 symmetric  matrix  whose  eigenvalues  are  \\  = — 1,  A2  = 3,  A3  = 7 and  for  which  the  corresponding 
eigenvectors  are  vi  = (0,  1,  — 1),  V2  = (1,  0,  0),  V3  = (0,  1,  1). 

(b)  Is  there  a 3 x 3 symmetric  matrix  with  eigenvalues  X\=  — 1,  A2  = 3,  A3  = 7 and  corresponding  eigenvectors 
VJ  = (0,  1,  — 1),  V2  = (1,  0,  0),  V3  = (1,  1,  1)?  Explain  your  reasoning. 

19.  Let  A be  a diagonalizable  matrix  with  the  property  that  eigenvectors  from  distinct  eigenvalues  are  orthogonal.  Must  A be 
symmetric?  Explain  you  reasoning. 


Answer: 


Yes 


20.  Prove:  If  (\i\,  112, u„)  is  an  orthonormal  basis  for  Rn,  and  if  A can  be  expressed  as 

T T T 

il  = £11111!  +C2u2u2  + ---  + Cwu„u„ 
thenv4  is  symmetric  and  has  eigenvalues  c\,  C2, ...»  cn. 


21.  In  this  exercise  we  will  establish  that  a matrix  A is  orthogonally  diagonalizable  if  and  only  if  it  is  symmetric.  We  have 
shown  that  an  orthogonally  diagonalizable  matrix  is  symmetric.  The  harder  part  is  to  prove  that  a symmetric  matrix  A is 
orthogonally  diagonalizable.  We  will  proceed  in  two  steps:  first  we  will  show  that  A is  diagonalizable,  and  then  we  will 
build  on  that  result  to  show  that  A is  orthogonally  diagonalizable. 


(a)  Assume  that  A is  a symmetric  nxn  matrix.  One  way  to  prove  that  A is  diagonalizable  is  to  show  that  for  each 
eigenvalue  Aq  the  geometric  multiplicity  is  equal  to  the  algebraic  multiplicity.  For  this  purpose,  assume  that  the 
geometric  multiplicity  of  Aq  is  k,  let  5q  = {uj,  112, u^}  be  an  orthonormal  basis  for  the  eigenspace  corresponding 
to  Aq,  extend  this  to  an  orthonormal  basis  B = {uj , 112, . . } for  R ”,  and  let  P be  the  matrix  having  the  vectors  of 
B as  columns.  As  shown  in  Exercise  34(Z?)  of  Section  5.2,  the  product  AP  can  be  written  as 


AP  = P 


0 


x 

Y 


Use  the  fact  that  B is  an  orthonormal  basis  to  prove  that  X = 0 ta  zero  matrix  of  size  n x (n  — k)  ] . 


(b)  It  follows  from  part  (a)  and  Exercise  34(c)  of  Section  5.2  that  A has  the  same  characteristic  polynomial  as 


C = 


-W/c 

0 


0 

Y 


Use  this  fact  and  Exercise  34(d)  of  Section  5.2  to  prove  that  the  algebraic  multiplicity  of  Aq  is  the  same  as  the 
geometric  multiplicity  of  Aq.  This  establishes  that  A is  diagonalizable. 

(c)  Use  Theorem  1.2.2(b)  and  the  fact  that  A is  diagonalizable  to  prove  that^4  is  orthogonally  diagonalizable. 


True-False  Exercises 


In  parts  (a)-(g)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer, 
(a)  If  A is  a square  matrix,  then  AA  ^and  A ^ A are  orthogonally  diagonalizable. 

Answer: 


True 


(b)  If  vi  and  V2  are  eigenvectors  from  distinct  eigenspaces  of  a symmetric  matrix,  then  ||Vj  | v.-,||2  = \\v\  ||2  I l|v2||2- 
Answer: 

True 

(c)  Every  orthogonal  matrix  is  orthogonally  diagonalizable. 

Answer: 

False 

(d)  If  A is  both  invertible  and  orthogonally  diagonalizable,  then  is  orthogonally  diagonalizable. 

Answer: 

True 

(e)  Every  eigenvalue  of  an  orthogonal  matrix  has  absolute  value  1 . 

Answer: 

True 

(f)  If  A is  an  n x n orthogonally  diagonalizable  matrix,  then  there  exists  an  orthonormal  basis  for  Rn  consisting  of 
eigenvectors  of  A. 

Answer: 

False 

(g)  Ifv4  is  orthogonally  diagonalizable,  then^4  has  real  eigenvalues. 

Answer: 

True 
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7.3  Quadratic  Forms 

In  this  section  we  will  use  matrix  methods  to  study  real-valued  functions  of  several  variables  in  which  each  term  is  either  the 
square  of  a variable  or  the  product  of  two  variables.  Such  functions  arise  in  a variety  of  applications,  including  geometry, 
vibrations  of  mechanical  systems,  statistics,  and  electrical  engineering. 


Definition  of  a Quadratic  Form 


Expressions  of  the  form 


a\x\  + a 2*2+  - • * A-anxn 

occurred  in  our  study  of  linear  equations  and  linear  systems.  If  a\,  a2, an  are  treated  as  fixed  constants,  then  this  expression 
is  a real- valued  function  of  the  n variables  x\,  x2,  xn  and  is  called  a linear  form  on  Rn.  All  variables  in  a linear  form  occur 
to  the  first  power  and  there  are  no  products  of  variables.  Here  we  will  be  concerned  with  quadratic  forms  on  Rn,  which  are 
functions  of  the  form 


2 , 2 . . 2 
^1*1  +32*2  + ...  + <3„*„ 


terms  a^XjXj  in  which  xj  * Xj 


The  terms  of  the  form  &kxixj  are  called  cross  product  terms.  It  is  common  to  combine  the  cross  product  terms  involving  xixj 
with  those  involving  xjxi  to  avoid  duplication.  Thus,  a general  quadratic  form  on  r}  would  typically  be  expressed  as 


2 2 

a 1*1  +<»2*2  ^a3x\x2 


(1) 


and  a general  quadratic  form  on  R 3 as 


2 2 2 

a 1*!  + <22*2  + <23*3  + 2(34*  1*2  + 2a  $x  1x3  + 2a  5*2*3  (2) 

If,  as  usual,  we  do  not  distinguish  between  the  number  a and  the  ] x 1 matrix  [a\,  and  if  we  let  x be  the  column  vector  of 
variables,  then  1 and  2 can  be  expressed  in  matrix  form  as 


= xTAx 


(verify).  Note  that  the  matrix^  in  these  formulas  is  symmetric,  that  its  diagonal  entries  are  the  coefficients  of  the  squared  terms, 
and  its  off-diagonal  entries  are  half  the  coefficients  of  the  cross  product  terms.  In  general,  if  A is  a symmetric  ^ x n matrix  and  x 
is  an  n x 1 column  vector  of  variables,  then  we  call  the  function 


[*1 

*2] 

1 1 
& & 
LO  h- 

33' 

a2_ 

1 1 

* * 
to  H- 

b 

x7^ 

'31 

a 4 

35 

-*r 

[*1 

*2 

*3] 

a/\ 

32 

<36 

*2 

35 

36 

33 

*3 

Qa{*}  = xr^JC 

the  quadratic  form  associated  with  A.  When  convenient,  3 can  be  expressed  in  dot  product  notation  as 


(3) 


x1  Ax  = x-  Ax.  = Ax  - x 


(4) 


In  the  case  where  A is  a diagonal  matrix,  the  quadratic  form  x^Ax  has  no  cross  product  terms;  for  example,  if  A has  diagonal 
entries  Al,A2 am,  then 


Ai 

0 • • 

• 0 ' 

"*r 

T 

x +x=  [x  1 *2  ‘ ' 

xn\ 

0 

a2  • • 

• 0 

*2 

= Ai*^  + A2*2  + " ' 

+ Am*m 

0 

0 • • 

• A„ 

*H 

EXAMPLE  1 Expressing  Quadratic  Forms  in  Matrix  Notation 


In  each  part,  express  the  quadratic  form  in  the  matrix  notation  x^Ax>  where  A is  symmetric. 

(a)  2x2  + 6xy  - 5y2 

(b)  xj  + 7x|  — 3^3  +4x1X2  — 2x1x3  + 8x2x2 


The  diagonal  entries  of  A are  the  coefficients  of  the  squared  terms,  and  the  off-diagonal  entries  are  half 
the  coefficients  of  the  cross  product  terms,  so 


2x2  + 6 xy  - 5 y2 


x2 +7x2  — 3x2 +4xiX2  — 2x1x3  + 8x2x3  = [^1  *3] 


1 

2 

-l' 

"*l" 

2 

7 

4 

x2 

1 

4 

-3 

x3 

Change  of  Variable  in  a Quadratic  Form 

There  are  three  important  kinds  of  problems  that  occur  in  applications  of  quadratic  forms: 


If  x^lx  is  a quadratic  form  on  p}  or  what  kind  of  curve  or  surface  is  represented  by  the  equation 

xTAx  = tf 

If  XJ  Ax  is  a quadratic  form  on  Rn,  what  conditions  must  A satisfy  for  XJ  Ax  to  have  positive  values  for 

x*0? 

If  x ^ Ax  is  a quadratic  form  on  Rn,  what  are  its  maximum  and  minimum  values  if  x is  constrained  to  satisfy 

11*11  = 1? 


We  will  consider  the  first  two  problems  in  this  section  and  the  third  problem  in  the  next  section. 

Many  of  the  techniques  for  solving  these  problems  are  based  on  simplifying  the  quadratic  form  x^Ax  by  making  a substitution 

* = Py  (5) 

that  expresses  the  variables  x\,  X2,  in  terms  of  new  variables  y\,y^,  yn-  ^ ^ invertible,  then  we  call  5 a change  of 

variable , and  if  P is  orthogonal,  then  we  call  5 an  orthogonal  change  of  variable. 

If  we  make  the  change  of  variable  x = Py  in  the  quadratic  form  XJ  Ax*  then  we  obtain 

xTAx  = (Py)  TA{Py)j  = yTP  TAPy  = yT{pTAP'y  (6) 

Since  the  matrix  B = P 7 AP  is  symmetric  (verify),  the  effect  of  the  change  of  variable  is  to  produce  a new  quadratic  form  y J By 
in  the  variables  y\,y2,  - y I n particular,  if  we  choose  P to  orthogonally  diagonalize  A , then  the  new  quadratic  form  will  be 
y Dy,  where  D is  a diagonal  matrix  with  the  eigenvalues  of  A on  the  main  diagonal;  that  is, 


At 

0 • • 

• 0 ' 

>r 

xTAx  = yTDy  =[y\y2' 

■ • y»\ 

0 

A2  • • 

• 0 

72 

0 

0 • • 

yn 

= AiJ>i  +A2>>2  + ‘ ‘ ‘ + Afly^ 


Thus,  we  have  the  following  result,  called  the  principal  axes  theorem. 


The  Principal  Axes  Theorem 

If  A is  a symmetric  nxn  matrix,  then  there  is  an  orthogonal  change  of  variable  that  transforms  the  quadratic  form  x^ylx 

T 

into  a quadratic  form  y Dy  with  no  cross  product  terms.  Specifically,  if  P orthogonally  diagonalizes  A,  then  making  the 
change  of  variable  x = Py  in  the  quadratic  form  XJ  Ax.  yields  the  quadratic  form 

\TAx  = yTDy  = X\yj  +^2  + ' ‘ • -hA^ 

in  which  X\f  A2, A„  are  the  eigenvalues  ofv4  corresponding  to  the  eigenvectors  that  form  the  successive  columns  of 
P. 


EXAMPLE  2 An  Illustration  of  the  Principal  Axes  Theorem 


Find  an  orthogonal  change  of  variable  that  eliminates  the  cross  product  terms  in  the  quadratic  form 
Q = — *3  — 4*1*2  4"  4*2*3?  an^  exPress  Q m terms  of  the  new  variables. 


The  quadratic  form  can  be  expressed  in  matrix  notation  as 


Q = xTAx  = 


*1 


*2  *3 


1 -2  0 

-2  0 2 

0 2-1 


*1 

*2 

*3 


The  characteristic  equation  of  the  matrix  A is 


A—  1 
2 
0 


2 0 
A -2 
-2  A+l 


= A3  — 9A  = A(A  + 3)  (A  — 3)  = 0 


so  the  eigenvalues  are  A = 0?  “3,  3.  We  leave  it  for  you  to  show  that  orthonormal  bases  for  the  three  eigenspaces 
are 


'2 

T 

2' 

3 

1 

, A = — 3: 

3 

2 

, A=  3: 

3 

2 

3 

3 

3 

2 

2 

1 

3 

3 

3 

Thus,  a substitution  x = Py  that  eliminates  the  cross  product  terms  is 


’*l" 

'2  1 2' 

333 

>r 

*2 



I _2  2 

72 

*3 

3 3 3 

2 2 1 

73 

3 3 3 

This  produces  the  new  quadratic  form 


O 

o 

o 

>f 

72  73] 

0-3  0 
0 0 3 

72 

73 

in  which  there  are  no  cross  product  terms. 


-3y2I 2  + 3y32 


If  A is  a symmetric  n x n matrix,  then  the  quadratic  form  xTAx  is  a real- valued  function  whose  range  is  the  set  of  all 
possible  values  for  x^Ax as  x varies  over  Rn.  It  can  be  shown  that  an  orthogonal  change  of  variable  x = Py  does  not  alter  the 
range  of  a quadratic  form;  that  is,  the  set  of  all  values  for  xJ Ax  as  x varies  over  Rn  is  the  same  as  the  set  of  all  values  for 

yT(PTAPy  as  y varies  over  Rn. 


Quadratic  Forms  in  Geometry 

Recall  that  a conic  section  or  conic  is  a curve  that  results  by  cutting  a double-napped  cone  with  a plane  (Figure  7.3.1).  The  most 
important  conic  sections  are  ellipses,  hyperbolas,  and  parabolas,  which  result  when  the  cutting  plane  does  not  pass  through  the 
vertex.  Circles  are  special  cases  of  ellipses  that  result  when  the  cutting  plane  is  perpendicular  to  the  axis  of  symmetry  of  the 
cone.  If  the  cutting  plane  passes  through  the  vertex,  then  the  resulting  intersection  is  called  a degenerate  conic.  The  possibilities 
are  a point,  a pair  of  intersecting  lines,  or  a single  line. 


I 

I 

I 


Circle 


Ellipse 


Parabola 


Hyperbola 


Figure  7.3.1 


A central  conic 
rotated  out  of 
standard  position 


Figure  7.3.2 


Quadratic  forms  in  R^  arise  naturally  in  the  study  of  conic  sections.  For  example,  it  is  shown  in  analytic  geometry  that  an 


equation  of  the  form 


ax2  4-  2bxy  + cy2  + dx  + ey  + f = 0 (7) 

in  which  a,  b , and  c are  not  all  zero,  represents  a conic  section.  If  = e = 0 in  7,  then  there  are  no  linear  terms,  so  the  equation 
becomes 


ax 2 4-  2i?xy  + cy 2 4*  / = 0 (8) 

and  is  said  to  represent  a central  conic.  These  include  circles,  ellipses,  and  hyperbolas,  but  not  parabolas.  Furthermore,  if  fa  = 0 
in  8,  then  there  is  no  cross  product  term  (i.e.,  term  involving  xy),  and  the  equation 

ax2+cy2+f  = 0 (9) 

is  said  to  represent  a central  conic  in  standard  position.  The  most  important  conics  of  this  type  are  shown  in  Table  1. 

Table  1 


0 

i y i 

P 

X 

y 

0 

/U 

1 \ 

* 

“7f 

i .* 

-a 

-0 

a * -a 

-0 

a * « | 

/ L 

P 

i 

1 / 

1 / 
i 

2 

1 

\K" 

V 

!• 

\j 

r +2l=  | 

i 

£.  + 2l  = . 

2 2 

A A., 

2 o'* 

a-  0- 

2 q2 

a P 

a2  P2 

^1 

I 

II 

(a>l 3 >0) 

(J3  > a > 0) 

(a  > 0.  > 0) 

(a  > 0. 0 > 0) 

If  we  take  the  constant /in  Equations  8 and  9 to  the  right  side  and  let  k — — / , then  we  can  rewrite  these  equations  in  matrix 
form  as 


[*  y] 


a 

b 


b 

c 


and 


[*  y] 


a 0 

~x~ 

_° 

y 

= k 


(10) 


The  first  of  these  corresponds  to  Equation  8 in  which  there  is  a cross  product  term  2 bxy,  and  the  second  corresponds  to  Equation 
9 in  which  there  is  no  cross  product  term.  Geometrically,  the  existence  of  a cross  product  term  signals  that  the  graph  of  the 
quadratic  form  is  rotated  about  the  origin,  as  in  Figure  7.3.2.  The  three-dimensional  analogs  of  the  equations  in  10  are 


a d e 

~x " 

o 

o 

~X  “ 

[X  y z ] 

d b / 
e / c 

z 

= A:  and  [x  ^ z] 

0 b 0 
0 0c 

z 

= k 


If  a , b,  and  c are  not  all  zero,  then  the  graphs  of  these  equations  in  fi?  are  called  central  quadrics  in  standard  position. 


(11) 


Identifying  Conic  Sections 

We  are  now  ready  to  consider  the  first  of  the  three  problems  posed  earlier,  identifying  the  curve  or  surface  represented  by  an 
equation  x^Ax.  = k in  two  or  three  variables.  We  will  focus  on  the  two-variable  case.  We  noted  above  that  an  equation  of  the 
form 


(12) 


ax 2 4-  2 bxy  + cy2  4-  / = 0 


represents  a central  conic.  If  £?  = 0,  then  the  conic  is  in  standard  position,  and  if  £ ^ Q,  it  is  rotated.  It  is  an  easy  matter  to 
identify  central  conics  in  standard  position  by  matching  the  equation  with  one  of  the  standard  forms.  For  example,  the  equation 


can  be  rewritten  as 


9x2  + 16y2  — 144  = 0 


x2  y2 
— + — = 1 
16  ^ 9 


which,  by  comparison  with  Table  1,  is  the  ellipse  shown  in  Figure  7.3.3. 

a y 


— + — = i 
16  9 


Figure  7.3.3 

If  a central  conic  is  rotated  out  of  standard  position,  then  it  can  be  identified  by  first  rotating  the  coordinate  axes  to  put  it  in 
standard  position  and  then  matching  the  resulting  equation  with  one  of  the  standard  forms  in  Table  1.  To  find  a rotation  that 
eliminates  the  cross  product  term  in  the  equation 


ax 2 + 2 bxy  + cy 2 = k 


(13) 


it  will  be  convenient  to  express  the  equation  in  the  matrix  form 

x y 


xTJix  = 


® =' 


(14) 


and  look  for  a change  of  variable 

x = Px' 

that  diagonalizes  A and  for  which  det(.P)  = 1 . Since  we  saw  in  Example  4 of  Section  7.1  that  the  transition  matrix 


P = 


cos  # —sin  # 
sin#  cos# 


(15) 


has  the  effect  of  rotating  the  xy-axes  of  a rectangular  coordinate  system  through  an  angle  0,  our  problem  reduces  to  finding  0 that 
diagonalizes  A , thereby  eliminating  the  cross  product  term  in  13.  If  we  make  this  change  of  variable,  then  in  the  xryf -coordinate 
system,  Equation  14  will  become 


x,TDx'  = 


/ / 

0 

< 

1 

1 

* 

1 

1 

CN 

O 

1 

1 

1 

= k 


where  Ai  and  A2  are  the  eigenvalues  of  A.  The  conic  can  now  be  identified  by  writing  16  in  the  form 

\xxa±\w'2  = k 


(16) 


(17) 


and  performing  the  necessary  algebra  to  match  it  with  one  of  the  standard  forms  in  Table  1.  For  example,  if  Aj,  A2,  and  k are 
positive,  then  17  represents  an  ellipse  with  an  axis  of  length  2\jk  I X\  in  the  x^-direction  and  2 \jk  / A in  the  y^-direction.  The 


first  column  vector  of  P,  which  is  a unit  eigenvector  corresponding  to  X\,  is  along  the  positive  x'-axis;  and  the  second  column 
vector  of  P,  which  is  a unit  eigenvector  corresponding  to  A2,  is  a unit  vector  along  the  y'  -axis.  These  are  called  the  principal 
axes  of  the  ellipse,  which  explains  why  Theorem  7.3.1  is  called  “the  principal  axes  theorem.”  (See  Figure  7.3.4.) 


EXAMPLE  3 Identifying  a Conic  by  Eliminating  the  Cross  Product  Term 

99 

Identify  the  conic  whose  equation  is  5x  — Axy  + 8y  — 36  = 0 by  rotating  the  xy-axes  to  put  the  conic  in 
standard  position. 

Find  the  angle  0 through  which  you  rotated  the  xy-axes  in  part  (a). 


Solution 

The  given  equation  can  be  written  in  the  matrix  form 

xTAx.  = 36 


where 


The  characteristic  polynomial  of  A is 


A — 5 2 

2 A — 8 


= (A  — 4)(A  — 9) 


so  the  eigenvalues  are  A = 4 and  A = 9-  We  leave  it  for  you  to  show  that  orthonormal  bases  for  the  eigenspaces 
are 


A = 4: 


_2_ 

& 

f5 


A = 9: 


f5 

JL 

f5 


Thus,  A is  orthogonally  diagonalized  by 


(18) 


Had  it  turned  out  that  det(P)  = — 1,  then  we 
would  have  interchanged  the  columns  to  reverse  the 
sign. 


Moreover,  it  happens  by  chance  that  det(^P)  = 1,  so  we  are  assured  that  the  substitution  x = p xr  performs  a 
rotation  of  axes.  It  follows  from  16  that  the  equation  of  the  conic  in  the  xfy  '-coordinate  system  is 


1 

1 

"4  O' 

1 

i 

1 

* 

1 

.0  9. 

1 

i 

= 36 


which  we  can  write  as 

4x'2  + 9/2  = 36  or  ^-  + ^ = 1 

We  can  now  see  from  Table  1 that  the  conic  is  an  ellipse  whose  axis  has  length  2a  = 6 in  the  xf -direction  and 
length  2/3  = 4 in  the  yr ■ -direction. 


It  follows  from  1 5 that 


which  implies  that 


_2_  _J_ 

f5  ~f5 
J_  _2_ 
f5  f5 


cos  0 —sin  0 
sin#  cos  6 


cos  0 = 


& 


sin  0 = — k, 

ft 


tan  0 = 


srnfl  __  J_ 
cos  0 2 


Thus,  9 = tan  ^ « 26  . 6 (Figure  7.3.5). 


Figure  7.3.5 


In  the  exercises  we  will  ask  you  to  show  that  if  £,  ^ Q,  then  the  cross  product  term  in  the  equation 

ax2  + 2 bxy  + cy2  = k 

can  be  eliminated  by  a rotation  through  an  angle  0 that  satisfies 

cot  2 e = ^=r- 
2b 

We  leave  it  for  you  to  confirm  that  this  is  consistent  with  part  (b)  of  the  last  example. 


Positive  Definite  Quadratic  Forms 

We  will  now  consider  the  second  of  the  two  problems  posed  earlier,  determining  conditions  under  which  > 0 f°r  all 
nonzero  values  of  x.  We  will  explain  why  this  is  important  shortly,  but  first  we  introduce  some  terminology. 


The  terminology  in  Definition  1 also  applies  to  the 
matrix  A;  that  is,  A is  positive  definite,  negative  definite, 
or  indefinite  in  accordance  with  whether  the  associated 
quadratic  form  has  that  property. 


n 


DEFINITION  1 

A quadratic  form  x^Ax  is  said  to  be 

positive  definite  if  x^ Ax  > 0 for  x ^ 0 

negative  definite  if  XJ  Ax  < 0 for  x ^ 0 

indefinite  if  XJ  ^4x  has  both  positive  and  negative  values 

J 


The  following  theorem,  whose  proof  is  deferred  to  the  end  of  the  section,  provides  a way  of  using  eigenvalues  to  determine 
whether  a matrix  A and  its  associated  quadratic  form  xTAx  are  positive  definite,  negative  definite,  or  indefinite. 


THEOREM  7.3.2 

If  A is  a symmetric  matrix,  then: 

(a)  x^ Ax  is  positive  definite  if  and  only  if  all  eigenvalues  of  A are  positive. 

(b)  x^ Ax  is  negative  definite  if  and  only  if  all  eigenvalues  of  A are  negative. 

(c)  XJ  Ax.  is  indefinite  if  and  only  if  A has  at  least  one  positive  eigenvalue  and  at  least  one  negative  eigenvalue. 


The  three  classifications  in  Definition  1 do  not  exhaust  all  of  the  possibilities.  For  example,  a quadratic  form  for 
which  XJ  Ax  > 0 if  x ^ 0 is  called  positive  semidefinite , and  one  for  which  xj  Ax  < 0 if  x ^ 0 is  called  negative  semidefinite. 
Every  positive  definite  form  is  positive  semidefinite,  but  not  conversely,  and  every  negative  definite  form  is  negative 
semidefinite,  but  not  conversely  (why?).  By  adjusting  the  proof  of  Theorem  7.3.2  appropriately,  one  can  prove  that  x^Ax  is 
positive  semidefinite  if  and  only  if  all  eigenvalues  of  A are  nonnegative  and  is  negative  semidefinite  if  and  only  if  all 
eigenvalues  of  A are  nonpositive. 


EXAMPLE  4 Positive  Definite  Quadratic  Forms 


It  is  not  usually  possible  to  tell  from  the  signs  of  the  entries  in  a symmetric  matrix  A whether  that  matrix  is 
positive  definite,  negative  definite,  or  indefinite.  For  example,  the  entries  of  the  matrix 


A = 


3 1 1 
1 0 2 
1 2 0 


are  nonnegative,  but  the  matrix  is  indefinite  since  its  eigenvalues  are  \ = 1,4,  —2  (verify).  To  see  this  another 
way,  let  us  write  out  the  quadratic  form  as 


'3 

1 

f 

"*l' 

*i  *2  *3 

1 

0 

2 

x2 

1 

2 

0 

x3 

= 3x\  A-  2x  i*2  4-  2x  1x3  + 4*2*3 


xTAx.= 


Positive  definite  and  negative  definite  matrices 
are  invertible.  Why? 


We  can  now  see,  for  example,  that 

xTAx  = 4 

and 

xr^x=  -4 


for 

*1 

= 0, 

*2=1. 

*3 

for 

*1 

= 0, 

*2=1. 

*3 

Classifying  Conic  Sections  Using  Eigenvalues 

If  xTBx  = k is  the  equation  of  a conic,  and  if  k t-  0?  then  we  can  divide  through  by  k and  rewrite  the  equation  in  the  form 

*rA-l  (20) 

where  A = (1  / k)B.  If  we  now  rotate  the  coordinate  axes  to  eliminate  the  cross  product  term  (if  any)  in  this  equation,  then  the 
equation  of  the  conic  in  the  new  coordinate  system  will  be  of  the  form 

Aix'2  + A2y'2  = l (21) 

in  which  Aj  and  A2  are  the  eigenvalues  of  A.  The  particular  type  of  conic  represented  by  this  equation  will  depend  on  the  signs 
of  the  eigenvalues  Ai  and  A2.  For  example,  you  should  be  able  to  see  from  21  that: 

xTA x = 1 represents  an  ellipse  if  Aj  > 0 and  A2  > 0. 

• xTAx  = 1 has  no  graph  if  Ai  < 0 and  A2  < 0. 

xTAx  = 1 represents  a hyperbola  if  Ai  and  A2  have  opposite  signs. 

In  the  case  of  the  ellipse,  Equation  21  can  be  rewritten  as 

(1  Tfcf  + (i/^)2  =1  <22) 

so  the  axes  of  the  ellipse  have  lengths  2 / ^A^  and  2 / ^2  (Figure  7.3.6). 


The  following  theorem  is  an  immediate  consequence  of  this  discussion  and  Theorem  7.3.2. 


THEOREM  7.3.3 


If  A is  a symmetric  2x2  matrix,  then: 

(a)  xTAx  = 1 represents  an  ellipse  if  A is  positive  definite. 

(b)  xTAx  = 1 has  no  graph  ifv4  is  negative  definite. 

(c)  x^Ax  = 1 represents  a hyperbola  if  ^4  is  indefinite. 


In  Example  we  performed  a rotation  to  show  that  the  equation 

5x2  — Axy  4-  8y2  — 36  = 0 


represents  an  ellipse  with  a major  axis  of  length  6 and  a minor  axis  of  length  4.  This  conclusion  can  also  be  obtained  by 
rewriting  the  equation  in  the  form 


ry 


1^=’ 


and  showing  that  the  associated  matrix 


5 1 


18  9 


has  eigenvalues  Ai  = 7-  and  A2  = These  eigenvalues  are  positive,  so  the  matrix 
represents  an  ellipse.  Moreover,  it  follows  from  21  that  the  axes  of  the  ellipse  have 
is  consistent  with  Example  3. 


A is  positive  definite  and  the  equation 
lengths  2 i i[\\  = 6 and  2 / \[\2  = 4,  which 


Identifying  Positive  Definite  Matrices 

Positive  definite  matrices  are  the  most  important  symmetric  matrices  in  applications,  so  it  will  be  useful  to  learn  a little  more 
about  them.  We  already  know  that  a symmetric  matrix  is  positive  definite  if  and  only  if  its  eigenvalues  are  all  positive;  now  we 
will  give  a criterion  that  can  be  used  to  determine  whether  a symmetric  matrix  is  positive  definite  without  finding  the 
eigenvalues.  For  this  purpose  we  define  the  kth  principal  submatrix  of  an  n x n matrix  A to  be  the  fc  x k submatrix  consisting  of 
the  first  k rows  and  columns  of  A.  For  example,  here  are  the  principal  submatrices  of  a general  4x4  matrix: 


"<311  <312  <*13  a14~ 

<221  «22  <*23  a24 
<331  <332  <*33  «34 
<341  a42  a43  a44 

<311  <312  <313  a14~ 
a21  <2 22  a23  a24 
<331  a32  a33  a34 
<341  a42  a43  a44 

<311  <*12  <*13  a14~ 
<321  a22  <* 23  a24 
<331  <332  «33  a34 
<341  a42  ^43  a44 

<311  <212  ^13  fl14" 
<321  «22  a23  a24 
<331  a32  «33  a34 
<341  a42  a43  a44 

First  principal  submatrix 

Second  principal  submatrix 

Third  principal  submatrix 

Fourth  principal  submatnx= 

The  following  theorem,  which  we  state  without  proof,  provides  a determinant  test  for  ascertaining  whether  a symmetric  matrix  is 
positive  definite. 


THEOREM  7.3.4 

A symmetric  matrix  A is  positive  definite  if  and  only  if  the  determinant  of  every  principal  submatrix  is  positive. 


EXAMPLE  5 Working  with  Principal  Submatrices 


The  matrix 


A = 


2 

-1 

-3 


-1 

2 

4 


-3 

4 

9 


is  positive  definite  since  the  determinants 


|2|  = 2, 


2 -1 
-1  2 
-3  4 


-3 

4 

9 


= 1 


are  all  positive.  Thus,  we  are  guaranteed  that  all  eigenvalues  of  A are  positive  and  x^^lx  > 0 for  x * 0- 


OPTIONAL 

We  conclude  this  section  with  an  optional  proof  of  Theorem  7.3.2. 

It  follows  from  the  principal  axes  theorem  (Theorem  7.3.1)  that  there  is  an  orthogonal 

change  of  variable  x = Py  for  which 


xTAx  = yTDy  = \\yj +\d>2  + — + K&n  (23) 

where  the  A's  are  the  eigenvalues  of  A.  Moreover,  it  follows  from  the  invertibility  of  P that  y * 0 if  and  only  if  x * 0?  s°  the 
values  of  x^Ax  f°r  x * 0 are  the  same  as  the  values  of  yJ  Dy  for  y * 0 Thus,  it  follows  from  23  that  x^Ax  > 0 for  x * 0 if  and 
only  if  all  of  the  A's  in  that  equation  are  positive,  and  that  xTAx  < 0 f°r  x * 0 if  and  only  if  all  of  the  A's  are  negative.  This 
proves  parts  ( a ) and  ( b ). 

Assume  that  A has  at  least  one  positive  eigenvalue  and  at  least  one  negative  eigenvalue,  and  to  be  specific,  suppose 
that  Ai  > 0 and  A2  < 0 in  23.  Then 


T t 

x Ax  > 0 if  jv  1 = 1 and  all  others  s are  0 


and 


T t 

x Ax  >0  if  72  = 1 and  all  others  s are  0 

which  proves  that  x^Ax  is  indefinite.  Conversely,  if  x^Ax  > 0 for  some  x,  then  yJ  Dy  > 0 for  some  y,  so  at  least  one  of  the  A’s 
in  23  must  be  positive.  Similarly,  if  x^Ax  < 0 f°r  some  x,  then  yJ  Dy  < 0 for  some  y,  so  at  least  one  of  the  A's  in  23  must  be 
negative,  which  completes  the  proof. 


Concept  Review 

• Linear  form 
Quadratic  form 
Cross  product  term 

Quadratic  form  associated  with  a matrix 
Change  of  variable 
Orthogonal  change  of  variable 
Principal  Axes  Theorem 
Conic  section 


Degenerate  conic 
Central  conic 

Standard  position  of  a central  conic 
Standard  form  of  a central  conic 
Central  quadric 
Principal  axes  of  an  ellipse 
Positive  definite  quadratic  form 
Negative  definite  quadratic  form 
Indefinite  quadratic  form 
Positive  semidefinite  quadratic  form 
Negative  semidefinite  quadratic  form 
Principal  submatrix 

Skills 

Express  a quadratic  form  in  the  matrix  notation  x^Ax*  where  A is  a symmetric  matrix. 

Find  an  orthogonal  change  of  variable  that  eliminates  the  cross  product  terms  in  a quadratic  form,  and  express  the 
quadratic  form  in  terms  of  the  new  variable. 

Identify  a conic  section  from  an  equation  by  rotating  axes  to  place  the  conic  in  standard  position,  and  find  the  angle  of 
rotation. 

Identify  a conic  section  using  eigenvalues. 

Classify  matrices  and  quadratic  forms  as  positive  definite,  negative  definite,  indefinite,  positive  semidefinite  or 
negative  semidefinite. 


Exercise  Set  7.3 


In  Exercises  1-2,  express  the  quadratic  form  in  the  matrix  notation  XJ  Ax.?  where  A is  a symmetric  matrix. 


L(a)  3x?  + 7a£ 

(b)  4*^  — 9*2  — 6*1*2 

(c)  9*^  — *|  + 4*3  + 6*1*2  — 8*1*3  + *2*3 

Answer: 


(a) 

(b) 

(c) 


[*1  *2] 
[*1  *2] 


3 0 pi 

_0  7JI/2 

4 -3 

-3  -9 


[*1  *2*3] 


*1 

*2_ 

3 -4 


3 -1  | 
-4  2 4 


*1 

*2 

*3 


2*  (a)  5*^  + 5*1*2 

(b)  -7*i*2 


(c)  x2  + *2  — 3*3  — 5x  i X2  + 9x  1x3 

In  Exercises  3-4,  find  a formula  for  the  quadratic  form  that  does  not  use  matrices. 


2 -3 
-3  5 


Answer: 


2x2  + 5 y2  - 6xy 


-2  l 1 


[X!  x2  x3]  7 Q 6 *2 


1 6 3 


In  Exercises  5-8,  find  an  orthogonal  change  of  variables  that  eliminates  the  cross  product  terms  in  the  quadratic  form  Q,  and 
express  Q in  terms  of  the  new  variables. 

5.  Q = 2x2  + 2x|  — 2xjX2 

Answer: 


1 1 

{2  {2 


1 1 \ y 2 


{2  {2 


Q = 3yj+y2 


6.  Q = 5x2  4-  2x|  +4x2  4-  4xiX2 

7.  (2  = 3x2  + 4x|  4-  4x  1x2  — 4x2X3 


Answer: 

_2  2 1 

r n 3 3 3 

X\ 

x2  = 2 1 2 

x 333 

J 1 2 _2 

3 3 3 


« ; Q=y*  + 4y%  + 7y$ 


8.  Q = 2x2  4-  5x|  4-  5x|  f 4xiX2  —4xix3  — 8x2X3 

In  Exercises  9-10,  express  the  quadratic  equation  in  the  matrix  form  xJ  Ax  4-  Kx  + f = 0,  where  x^Ax  is  the  associated 
quadratic  form  and  K is  an  appropriate  matrix. 

9'  (a)  2x2  + xy +x-6y4=2  = 0 


(b)  y2  + 7x  - - 5 = 0 


Answer: 

(a)  r 2 i] 

[xy]  1 * [j]+[-16]p]  + 2 = 0 


'0  o' 

~X ' 

~x  “ 

_0  1_ 

y 

+ [7-8] 

y 

10<  (a)  x2  — xy  + 5x  + 87  — 3 = 0 
(b)  5xy  = 8 

In  Exercises  11-12,  identify  the  conic  section  represented  by  the  equation. 

1L(a)  2x7  + 5y2  = 20 

(b)  x2-y2-  8 = 0 

(c)  ly2  — 2x  = 0 

(d)  x2  +y2  — 25  = 0 


Answer: 

(a)  ellipse 

(b)  hyperbola 

(c)  parabola 

(d)  circle 

12<  (a)  4x2  + 9y2=l 

(b)  4x2  - 5 y2  = 20 

(c)  -x2  = 2 y 

(d)  x2-3  = -^2 

In  Exercises  13-16,  identify  the  conic  section  represented  by  the  equation  by  rotating  axes  to  place  the  conic  in  standard 
position.  Find  an  equation  of  the  conic  in  the  rotated  coordinates,  and  find  the  angle  of  rotation. 

13. 2x2  - 4 xy -y2  + 8 = 0 


Answer: 

Hyperbola:  2(y')2  - 3(x')2  = 8;  - 26  . 6° 

14.  5x2  + 4xy  + 5y2  = 9 

15. 1 \x2  + 24xy  + 4y2-  15  = 0 

Answer: 

Hyperbola:  4(xr)2  - (y1)2  = 3;  0 = 36.9° 

16- x2+xy+y2  = ^ 

In  Exercises  17-18,  determine  by  inspection  whether  the  matrix  is  positive  definite,  negative  definite,  indefinite,  positive 
semidefmite,  or  negative  semidefmite. 

17- (a)  r 1 o" 

_°  2_ 

(b)  r-i  c 
0 -2 


(c) 

(d) 

(e) 


-1  0 
0 2 
1 O' 

0 °_ 

0 0 

0 -2 


Answer: 


(a)  Positive  definite 

(b)  Negative  definite 

(c)  Indefinite 

(d)  Positive  semidefinite 

(e)  Negative  semidefinite 


18. 


(a) 

(b) 

(c) 

(d) 


2 0 
0 “5 
-2  0 
0 -5 
2 0 
0 5 
0 0 
0 -5 


(e) 


2 0 
0 0 


In  Exercise  19-24,  classify  the  quadratic  form  as  positive  definite,  negative  definite,  indefinite,  positive  semidefinite,  or 
negative  semidefinite. 

19.  x2  +*2 
Answer: 


Positive  definite 

20.  — xj  — 3*2 

21.  (*i -x2)2 


Answer: 


Positive  semidefinite 

22.  -(xj  -x2y 


Answer: 


Indefinite 
24.  *1*2 

In  Exercises  25-26,  show  that  the  matrix  A is  positive  definite  first  by  using  Theorem  7.3.2  and  second  by  using  Theorem 
7.3.4. 


25. 


5 

-2 

-2 

5_ 

(b) 

2 

-1 

O' 

A = 

-1 

2 

0 

0 

0 

5 

^ A = 

"2  r 
1 2_ 

(b) 

3 

-1 

0 

A = 

-1 

2 

-1 

0 

-1 

3 

In  Exercises  27-28,  find  all  values  of  k for  which  the  quadratic  form  is  positive  definite. 

27.  5*  2 + x\  +&X3  +4*1*2  “ 2*1*3  “ 2*2*3 
Answer: 

k > 2 

28.  3*  2 + *|  + 2*|  — 2*1*3  4-  2£*2*3 

29*  Let  x^4x  be  a quadratic  form  in  the  variables  *i,  *2 *M,  and  define  T. R ” - > 7?  by  7* (x J = x‘  -4x. 

(a)  Show  that  + y)  = 7’[x  j + 2xJ Uy  + r(yj. 

(b)  Show  that  r (cx ) = c^T (x  J 

30.  Express  the  quadratic  form  (c  1*  1 1-  ^2*2  E . . . + cM*M)  2 in  the  matrix  notation  x J Ax,  where  A is  symmetric. 

31.  In  statistics,  the  quantities 

* = \ (*t  + *2+  — + *m) 
and 

s?  = -^3y[(*l  ~*)2  + (x2-x)2  + -+  (x„-x)2] 

are  called,  respectively,  the  sample  mean  and  sample  variance  of  x = (*1,  *2, *M). 

(a)  Express  the  quadratic  form  s 2 in  the  matrix  notation  XJ  Ax,  where  A is  symmetric. 

(b)  Is  Sj  a positive  definite  quadratic  form?  Explain. 

Answer: 


(a)  1 

n 

1 

A=  n{n- 1) 

1 

n (n  — 1 ) 

(b)  Yes 


1 

« («  — 1 ) 
1 


1 

«(«—!) 


1 

« («  — 1 ) 
1 

n {n  — 1 ) 
I 


32.  The  graph  in  an  xyz-coordinate  system  of  an  equation  of  form  ax  + by  + cz  = 1 in  which  a , Z?,  and  c are  positive  is  a 

surface  called  a central  ellipsoid  in  standard  position  (see  the  accompanying  figure).  This  is  the  three-dimensional 

9 9 9 9 9 

generalization  of  the  ellipse  ax  + by  = 1 in  the  xy-plane.  The  intersections  of  the  ellipsoid  ax  + by  + cz  = 1 with  the 


coordinate  axes  determine  three  line  segments  called  the  axes  of  the  ellipsoid.  If  a central  ellipsoid  is  rotated  about  the  origin 
so  two  or  more  of  its  axes  do  not  coincide  with  any  of  the  coordinate  axes,  then  the  resulting  equation  will  have  one  or  more 
cross  product  terms. 

(a)  Show  that  the  equation 


42.  4„2  ^ O - 4™  ^ 4^  , 4W  1 
-x  + -y  + -z  + -xy  + -xz  + -yz  = 1 


represents  an  ellipsoid,  and  find  the  lengths  of  its  axes.  [Suggestion:  Write  the  equation  in  the  form  x^Ax  = 1 an4  make 
an  orthogonal  change  of  variable  to  eliminate  the  cross  product  terms. 

(b)  What  property  must  a symmetric  3x3  matrix  have  in  order  for  the  equation  x^Ax  = 1 to  represent  an  ellipsoid? 


Figure  Ex-32 

33.  What  property  must  a symmetric  2x2  matrix  A have  for  XJ  Ax.  = 1 to  represent  a circle? 


Answer: 


A must  have  a positive  eigenvalue  of  multiplicity  2. 

. Prove:  If  £ ^ 0,  then  the  cross  product  term  can  be  e 
coordinate  axes  through  an  angle  0 that  satisfies  the  equation 

cot  26  - 


34.  Prove:  If  £ ^ 0,  then  the  cross  product  term  can  be  eliminated  from  the  quadratic  form  ax  A-  2 bxy  + cyA  by  rotating  the 

a — c 
2b 

35.  Prove  that  if  A is  an  ^ x n symmetric  matrix  all  of  whose  eigenvalues  are  nonnegative,  then  xJ  Ax  > 0 for  all  nonzero  x in 

True-False  Exercises 


In  parts  (a)-(l)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  A symmetric  matrix  with  positive  definite  eigenvalues  is  positive  definite. 

Answer: 

True 

(b)  x j*  - xj  + A + Ax  ix  2X2  is  a quadratic  form. 

Answer: 

False 

(c)  (*1  =3*2)^  is  a quadratic  form. 

Answer: 

True 

(d)  A positive  definite  matrix  is  invertible. 


Answer: 


True 

(e)  A symmetric  matrix  is  either  positive  definite,  negative  definite,  or  indefinite. 

Answer: 

False 

(f)  If  A is  positive  definite,  then  is  negative  definite. 

Answer: 

True 

(§)  x ■ x is  a quadratic  form  for  all  x in  Rn. 

Answer: 

True 

(h)  If  xTAx.  is  a positive  definite  quadratic  form,  then  so  is  xTA~lx- 
Answer: 

True 

(i)  If  A is  a matrix  with  only  positive  eigenvalues,  then  x^Ax  is  a positive  definite  quadratic  form. 

Answer: 

False 

(j)  If  A is  a 2 x 2 symmetric  matrix  with  positive  entries  and  det(^4)  > 0,  then  A is  positive  definite. 

Answer: 

True 

(k)  If  XJ  ^lx  is  a quadratic  form  with  no  cross  product  terms,  then  A is  a diagonal  matrix. 

Answer: 

False 

(l)  If  xTAx  is  a positive  defmitequadratic  form  in  two  variables  and  ^ t Q,  then  the  graph  of  the  equation  x^Ax.  = c is  an  ellipse. 
Answer: 

False 
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7.4  Optimization  Using  Quadratic  Forms 

Quadratic  forms  arise  in  various  problems  in  which  the  maximum  or  minimum  value  of  some  quantity  is  required. 
In  this  section  we  will  discuss  some  problems  of  this  type. 


Constrained  Extremum  Problems 

Our  first  goal  in  this  section  is  to  consider  the  problem  of  finding  the  maximum  and  minimum  values  of  a 
quadratic  form  x J Ax  subject  of  the  constraint  |x||  = 1 . Problems  of  this  type  arise  in  a wide  variety  of 
applications. 

To  visualize  this  problem  geometrically  in  the  case  where  x T Ax  is  a quadratic  form  on  £2,  view  z — x as  the 
equation  of  some  surface  in  a rectangular  xyz-coordinate  system  and  view  ||x||  = 1 as  the  unit  circle  centered  at 
the  origin  of  the  xy-plane.  Geometrically,  the  problem  of  finding  the  maximum  and  minimum  values  of  x ^ Ax 
subject  to  the  requirement  ||x||  = 1 amounts  to  finding  the  highest  and  lowest  points  on  the  intersection  of  the 
surface  with  the  right  circular  cylinder  determined  by  the  circle  (Figure  7.4.1). 

Constrained  at  Constrained 
minimum  maximum 


Unit  circle 


Figure  7.4.1 


The  following  theorem,  whose  proof  is  deferred  to  the  end  of  the  section,  is  the  key  result  for  solving  problems  of 
this  type. 


Constrained  Extremum  Theorem 

Let  A be  a symmetric  ^ x n matrix  whose  eigenvalues  in  order  of  decreasing  size  are 
Ai>A2>  " • • >A„.  Then: 

(a)  the  quadratic  form  x TAx  attains  a maximum  value  and  a minimum  value  on  the  set  of  vectors  for 
which  ||x||  = 1; 

(b)  the  maximum  value  attained  in  part  ( a ) occurs  at  a unit  vector  corresponding  to  the  eigenvalue  \\ ; 

(c)  the  minimum  value  attained  in  part  (< a ) occurs  at  a unit  vector  corresponding  to  the  eigenvalue  An. 

The  condition  ||x||  = 1 in  this  theorem  is  called  a constraint , and  the  maximum  or  minimum  value  of 
x J Ax  subject  to  the  constraint  is  called  a constrained  extremum.  This  constraint  can  also  be  expressed  as 
X Tx  = 1 or  as  ^ + xj  + • • • = when  convenient. 


EXAMPLE  1 Finding  Constrained  Extrema 


Find  the  maximum  and  minimum  values  of  the  quadratic  form 

z = 5x2  + 5 y2  + 4 xy 

2 2 

subject  to  the  constraint  x + y = 1 . 


The  quadratic  form  can  be  expressed  in  matrix  notation  as 
z = 5x2  + 5 y2  4-  4 xy  = xTAx  = 


x y 

'5  2' 

~x~ 

_2  5 

y 

We  leave  it  for  you  to  show  that  the  eigenvalues  of  A are  X\  = 7 and  X2  = 3 and  that  corresponding 
eigenvectors  are 


Ai=7: 

Normalizing  these  eigenvectors  yields 

Ai=7: 

Thus,  the  constrained  extrema  are 


, A2  = 3: 


-1 

1 


1 

f2 

, A2  = 3: 

f2 

1 

1 

f2 

constrained  maximum:z  = 7 at(x,  .y)  = — ]=,  — 

1/2  {2} 

constrained  mjnjmum  z = 3 at(ar,  v)  = [ — 7=,  -4= 

l P-  Pi 


a) 


Since  the  negatives  of  the  eigenvectors  in  1 are  also  unit  eigenvectors,  they  too  produce  the  maximum 
and  minimum  values  of  z;  that  is,  the  constrained  maximum  z = l also  occurs  at  the  point 


(x,y)  = 


_L 

f2’ 


1 


f2 


and  the  constrained  minimum  z — 3 at  (x,  y)  = 


_L 


EXAMPLE  2 A Constrained  Extremum  Problem 

2 2 

A rectangle  is  to  be  inscribed  in  the  ellipse  4x  + 9 y = 36,  as  shown  in  Figure  7.4.2.Use 
eigenvalue  methods  to  find  nonnegative  values  of  v andy  that  produce  the  inscribed  rectangle  with 


maximum  area. 


i 

{x.  y) 

X 

A rectangle  inscribed  in  the  ellipse  Ax^  + 9y^  = 36. 


The  area  z of  the  inscribed  rectangle  is  given  by  z = Axy , so  the  problem  is  to  maximize 
the  quadratic  form  z — Axy  subject  to  the  constraint  Ax  + 9y  = 36.  In  this  problem,  the  graph  of 

the  constraint  equation  is  an  ellipse  rather  than  the  unit  circle  as  required  in  Theorem  7.4.1,  but  we 
can  remedy  this  problem  by  rewriting  the  constraint  as 

(tf+(t)2=' 

and  defining  new  variables,  x i and  y \ , by  the  equations 

x = 3x\  and  y = 2y\ 

This  enables  us  to  reformulate  the  problem  as  follows: 

maximize  z = Axy  = 2Ax  \y  \ 

subject  to  the  constraint 

A +>'i  = 1 

To  solve  this  problem,  we  will  write  the  quadratic  form  z = 2 Ax  iv  i as 

z = = 

We  now  leave  it  for  you  to  show  that  the  largest  eigenvalue  of  A is  \ = 12  and  that  the  only 
corresponding  unit  eigenvector  with  nonnegative  entries  is 

1 


xi  y i 

" 0 

12:1 

-xi' 

12 

0_ 

y i_ 

x = 


*1 
y i 


& 

_L 

f2 


Thus,  the  maximum  area  is  z = 12?  and  this  occurs  when 

3 

x = 3x  i = —j=  and  y = 2y\ 

& 


_2_ 

& 


Constrained  Extrema  and  Level  Curves 

A useful  way  of  visualizing  the  behavior  of  a function  f fa,  y ) of  two  variables  is  to  consider  the  curves  in  the 
xy-plane  along  which  f y)  is  constant.  These  curves  have  equations  of  the  form 

/(*,  j v)=k 


and  are  called  the  level  curves  of / (Figure  7.4.3).In  particular,  the  level  curves  of  a quadratic  form  x J Ax on  E? 
have  equations  of  the  form 


xrAs.  = k (2) 

so  the  maximum  and  minimum  values  of  x ^ Ax  subject  to  the  constraint  ||x||  = 1 are  the  largest  and  smallest 
values  of  k for  which  the  graph  of  2 intersects  the  unit  circle.  Typically,  such  values  of  k produce  level  curves  that 
just  touch  the  unit  circle  (Figure  7.4.4),  and  the  coordinates  of  the  points  where  the  level  curves  just  touch  produce 
the  vectors  that  maximize  or  minimize  x 1 Ax  subject  to  the  constraint  ||x||  = 1 . 

i k z 

z -fU  y) 

/ 


Plane  z = k 


Figure  7.4.4 


EXAMPLE  3 Example  1 Revisited  Using  Level  Curves 


In  Example  1 (and  its  following  remark)  we  found  the  maximum  and  minimum  values  of  the 
quadratic  form 

z = 5x2  + 5y2  + Axy 

subject  to  the  constraint  x + y = 1 . We  showed  that  the  constrained  maximum  is  z = 7,  and  this  is 
attained  at  the  points 


o,  7)  = 


and  ( x,y ) 


(3) 


and  that  the  constrained  minimum  z = 3>  an(A  this  is  attained  at  the  points 

/ \ / 


\ 


j L 

{2’  {2 


j i_ 

{2’  {2 


0, 7) = 


and  0,  y ) = 


(4) 


9 9 

Geometrically,  this  means  that  the  level  curve  5x  +5 y + 4 xy  = 7 should  just  touch  the  unit 

2 2 

circle  at  the  points  in  3,  and  the  level  curve  5x  ■+  5y  + 4xy  = 3 should  just  touch  it  at  the  points 
in  4.  All  of  this  is  consistent  with  Figure  7.4.5. 


i 

l y - 5 r 2 . . 

y-5.tr  + 5y*  + 4ry  = 7 
' / 

/ 'V:  ' \2  F-  4 

- + >7=1  \ 

t 

(-jr'-j;)/ 

V ^2  y2  p 

\ 

5jrJ+5yJ  + 4iry  = 3 

Figure  7.4.5 


CALCULUS  REQUIRED 

Relative  Extrema  of  Functions  of  Two  Variables 

We  will  conclude  this  section  by  showing  how  quadratic  forms  can  be  used  to  study  characteristics  of  real-valued 
functions  of  two  variables. 

Recall  that  if  a function  f y)  has  first-order  partial  derivatives,  then  its  relative  maxima  and  minima,  if  any, 
occur  at  points  where 

fx(*>  y)  = 0 and  fy(x,y)  = 0 

These  are  called  critical  points  of f.  The  specific  behavior  of/at  a critical  point  (xq,  _yg)  is  determined  by  the  sign 
of 


D(x,  y)=f  ( x , y)—f  (x0,  y0 ) (5) 

at  points  (x,y)  that  are  close  to,  but  different  from,  (xo,  7o) : 

If  D(x,  y)  > 0 at  points  (x,  y)  that  are  sufficiently  close  to,  but  different  from,  ^q),  then 

/ (*0>  7o)  < / (*,  y ) at  such  P°hits  and / is  said  to  have  a relative  minimum  at  ^q)  (Figure  7.4.6a). 

If  D{x , y)  < 0 at  points  (x,  y)  that  are  sufficiently  close  to,  but  different  from,  ^q),  then 
/ Oo,  70)  > / (*,  y ) at  such  points  and /is  said  to  have  a relative  maximum  at  (xq,  ^q)  (Figure  7.4.6Z?). 

If  D(x , y)  has  both  positive  and  negative  values  inside  every  circle  centered  at  (^q,  70),  then  there  are  points 
(x,  y ) that  are  arbitrarily  close  to  (*o,  70 ) at  which  f (x  0,  7o)  < / 7)  anc*  P°ints  (*,7)  that  are 

arbitrarily  close  to  (xq,  70)  at  which  f (x  0,  70)  > / (*>  >'  •'  * In  this  case  we  saY  that /has  a saddle  point  at 
(*0,  70)  (Figure  7.4.6c). 


Relative  minimum  at  (0, 0) 


(a) 


kz 


Relative  maximum  at  (0, 0) 

(b) 


Saddle  point  at  (0, 0) 


( c ) 

Figure  7.4.6 

In  general,  it  can  be  difficult  to  determine  the  sign  of  5 directly.  However,  the  following  theorem,  which  is  proved 
in  calculus,  makes  it  possible  to  analyze  critical  points  using  derivatives. 


Second  Derivative  Test 


Suppose  that  ^q)  is  a critical  point  of  f (x9  y ) and  that / has  continuous  second-order  partial 
derivatives  in  some  circular  region  centered  at  ^q).  Then: 


(a)  /has  a relative  minimum  at  yg ) if 

fxx(x 0,  7o)/^(^0.  70)  70)  >0  and  f xx(x 0,  70)  > 0 

(b)  /has  a relative  maximum  at  (xq,  yn ) if 

fxx(XQ,  yo)fyy(XQ,  y0)  ~ fXy(x0,  yQ)  > 0 and  f Xx(x  0.  70)  < 0 

(c)  /has  a saddle  point  at  (/g,  y g)  if 

fxx(x o.  7o)/^(^0.7o)  -fx/(xo.  70)  <0 

(d)  The  test  is  inconclusive  if 

fxx(* 0.  yo)fyy(xO,  7o)  7o)  = 0 


Our  interest  here  is  in  showing  how  to  reformulate  this  theorem  using  properties  of  symmetric  matrices.  For  this 
purpose  we  consider  the  symmetric  matrix 


fxxix.y)  fxy(x,y) 
fxy(x>y)  fyy(x>y) 


which  is  called  the  Hessian  or  Hessian  matrix  of /in  honor  of  the  German  mathematician  and  scientist  Ludwig 
Otto  Hesse  (1811-1874).  The  notation  H(x,  y ) emphasizes  that  the  entries  in  the  matrix  depend  on  x and  7.  The 
Hessian  is  of  interest  because 


det 


H(x0,yQ) 


fxxixo.yo)  fxyixo.yo ) 

fxy(XQ,yo)  fyy(XQ,yo) 


= fxx(x 0.  yo)fyy(.X0, >>o)  - fxy  Oo.  7o) 


is  the  expression  that  appears  in  Theorem  7.4.2.  We  can  now  reformulate  the  second  derivative  test  as  follows. 


Hessian  Form  of  the  Second  Derivative  Test 

Suppose  that  (xg,  y g)  is  a critical  point  of  f(x,y)  and  that / has  continuous  second-order  partial 
derivatives  in  some  circular  region  centered  at  (jg,  yn ) . If  H(x g,  yg)  is  the  Hessian  of / at  (^g,  yg),  then: 

(a)  /has  a relative  minimum  at  (xg,  yg)  if  H(x g,  yg)  is  positive  definite. 

(b)  /has  a relative  maximum  at  (xg,  yn ) if  H(x g,  yg)  is  negative  definite. 

(c)  /has  a saddle  point  at  (*g,  yg)  if  H{x g,  yg)  is  indefinite. 

(d)  The  test  is  inconclusive  otherwise. 


We  will  prove  part  {a).  The  proofs  of  the  remaining  parts  will  be  left  as  exercises. 

If  H(x g,  yg)  is  positive  definite,  then  Theorem  7.3.4  implies  that  the  principal  submatrices  of 
H (xq,  yg)  have  positive  determinants.  Thus, 


det[tf(x0,  7o)]  = 


= fxx(* 0.  yo)Jyy(xO,  7o)  ~ fxy(.x 0.  7o)  > 0 

and 

det[/  (*0.  Xo)l  =/  XX  (*o>  yo)  > o 
so  / has  a relative  minimum  at  (^q,  yn ) by  part  (a)  of  Theorem  7.4.2. 


f xxix0,  70)  fxy(.X 0,  70) 
/ xy(x 0.  70)  /yy(*0>  yo) 


EXAMPLE  4 Using  the  Hessian  to  Classify  Relative  Extrema 


Find  the  critical  points  of  the  function 

/ |x,  7 j = yx3  4-  xy2  -8x7  + 3 

and  use  the  eigenvalues  of  the  Hessian  matrix  at  those  points  to  determine  which  of  them,  if  any,  are 
relative  maxima,  relative  minima,  or  saddle  points. 


To  find  both  the  critical  points  and  the  Hessian  matrix  we  will  need  to  calculate  the  first 
and  second  partial  derivatives  of f.  These  derivatives  are 

/,(x,  7)=x2+72-87,  fy(x,  y)  = 2xy-2x,  fxy(x,  y)  =2y-S 

fxx(x.  y)  = 2x,  fyy(x’  y)  = 2* 

Thus,  the  Hessian  matrix  is 

2x  27  — 8 
27  — 8 2x 

To  find  the  critical  points  we  set  / x and  / y equal  to  zero.  This  yields  the  equations 

f x(x,  y)  =x2 +y2 -%y  = o and  fy(x,  7)  = 2x7-8x  = 2x(>-4)  = 0 

Solving  the  second  equation  yields  * = 0 or  y = 4.  Substituting  % = 0 in  the  first  equation  and 
solving  fory  yields  y = 0 or  y = 8;  and  substituting  y = 4 into  the  first  equation  and  solving  forx 
yields  * = 4 or  x = — 4*  Thus,  we  have  four  critical  points: 

(0,0),  (0,8),  (4,4),  (-4,4) 


fxxix.y)  fxyte.y) 
fxy(x,y)  fyy(x,y) 


Evaluating  the  Hessian  matrix  at  these  points  yields 

0 -8 


H{ 0,  0)  = 

H(A,  4)  = 


-8 

8 0 
0 8 


H( 0,  8)  = 

H(—A  4)  = 


0 8 
8 0. 

'-8 

0 


0 

-8 


We  leave  it  for  you  to  find  the  eigenvalues  of  these  matrices  and  deduce  the  following  classifications 
of  the  stationary  points: 


Critical  Point  (xo,  jo) 

h 

Classification 

(0,  0) 

8 

-8 

Saddle  point 

(0,  8) 

8 

-8 

Saddle  point 

Critical  Point  (xo,  yo) 

Ai 

a2 

Classification 

(4,  4) 

8 

8 

Relative  minimum 

(-4,  4) 

-8 

-8 

Relative  maximum 

OPTIONAL 

We  conclude  this  section  with  an  optional  proof  of  Theorem  7.4.1. 

The  first  step  in  the  proof  is  to  show  that  Ax  has  constrained  maximum  and  minimum 
values  for  ||x||  = 1.  Since  A is  symmetric,  the  principal  axes  theorem  (Theorem  7.3.1)  implies  that  there  is  an 
orthogonal  change  of  variable  x = Py  such  that 


xTAx  = \ly2l+\2y22+  • • • (6) 

in  which  \\,  A2,  AM  are  the  eigenvalues  of  A.  Let  us  assume  that  |x||  = 1 and  that  the  column  vectors  of  P 
(which  are  unit  eigenvectors  of  A)  have  been  ordered  so  that 

Ai>A2>  • * • >A n (7) 

Since  the  matrix  P is  orthogonal,  multiplication  by  P is  length  preserving,  so  that  ||y  ||  = ||x||  = 1 ; that  is, 

yj  +^2  + • ■ ■ -¥yn  = 1 

It  follows  from  this  equation  and  7 that 

A = A „{y2+y2+  • • • +y2}  < • • • +X&i 

< Al  +^2  + ■ ■ ■ +7«J=^1 

and  hence  from  6 that 

A„  <xTAx<  Ai 

This  shows  that  all  values  of  x ^ Ax  f°r  which  ||x||  = 1 lie  between  the  largest  and  smallest  eigenvalues  of  A.  Now 
let  x be  a unit  eigenvector  corresponding  to  X\ . Then 

Ax  = x^AixJ  = Aix^x  = Ai  ||x||2  = Ai 

which  shows  that  x J Ax  has  Ai  as  a constrained  maximum  and  that  this  maximum  occurs  if  x is  a unit  eigenvector 
of  A corresponding  to  Aj . Similarly,  if  x is  a unit  eigenvector  corresponding  to  AM,  then 

xTAx  = x^A„xJ  = A„x^x  = A„||x||2  = A„ 

so  x ^ Ax  has  A n as  a constrained  minimum  and  this  minimum  occurs  if  x is  a unit  eigenvector  of  A corresponding 
to  Am.  This  completes  the  proof. 


Concept  Review 

Constraint 

Constrained  extremum 
Level  curve 
Critical  point 
Relative  minimum 
Relative  maximum 
Saddle  point 
Second  derivative  test 
Hessian  matrix 

Skills 

Find  the  maximum  and  minimum  values  of  a quadratic  form  subject  to  a constraint. 

Find  the  critical  points  of  a real-valued  function  of  two  variables,  and  use  the  eigenvalues  of  the  Hessian 
matrix  at  the  critical  points  to  classify  them  as  relative  maxima,  relative  minima,  or  saddle  points. 


Exercise  Set  7.4 

In  Exercises  1-4,  find  the  maximum  and  minimum  values  of  the  given  quadratic  form  subject  to  the  constraint 

2 2 

X-+^=l,  and  determine  the  values  of  x and  y at  which  the  maximum  and  minimum  occur. 

1.  5x2 - y 2 

Answer: 

Maximum:  5 at  (1,  0)  and  (—1,0);  minimum:  _]  at  (0,  1)  and  (0,  — 1) 

2.  xy 

3.  3x2  + ly2 
Answer: 

Maximum:  7 at  (0,  1)  and  (0,  -1);  minimum:  3 at  (1,  0)  and  (-1,0) 

4.  5x2  4-  5 xy 

In  Exercises  5-6,  find  the  maximum  and  minimum  values  of  the  given  quadratic  form  subject  to  the  constraint 

x2+y2  + z2=  1 

and  determine  the  values  of  x,  y,  and  z at  which  the  maximum  and  minimum  occur. 

5.  9x2  + 4y2  + 3z2 


Answer: 


Maximum:  9 at  (1,  0,  0)  and  (-1,0,  0);  minimum:  3 at  (0,  0,  1)  and  (0,  0,-1) 

6.  2x2  + y 2 4z2  4-  2 xy  + 2xz 

7.  Use  the  method  of  Example  2 to  find  the  maximum  and  minimum  values  of  xy  subject  to  the  constraint 
4x2  + By2  =16. 

Answer: 

Maximum:  z = 4^2  at  (xf  y)  = (2^2,  2 J and  j — 2^2,  — 2 j;  minimum:  z = — 4^2  at 
(x,y)  = (-2^2,  2 J and  (2^2,  -2) 

8.  Use  the  method  of  Example  2 to  find  the  maximum  and  minimum  values  of  x + xy  + 2 y subject  to  the 

2 2 

constraint  x +3 y = 16. 

In  Exercises  9-10,  draw  the  unit  circle  and  the  level  curves  corresponding  to  the  given  quadratic  form.  Show  that 
the  unit  circle  intersects  each  of  these  curves  in  exactly  two  places,  label  the  intersection  points,  and  verify  that 
the  constrained  extrema  occur  at  those  points. 

9.  5 x2-y2 
Answer: 


10.  xy 

(a)  Show  that  the  function  / y J = 4xy  — x^  — y 4 has  critical  points  at  (0,  0),  (1 , 1),  and  ( — 1 , — 1 ) . 

(b)  Use  the  Hessian  form  of  the  second  derivative  test  to  show / has  relative  maxima  at  (1,  1)  and  ( — 1,  — 1) 
and  a saddle  point  at  (0,  0). 

(a)  Show  that  the  function  / y J = “ y 3 has  critical  points  at  (0,  0)  and  ( — 2,  2) . 

(b)  Use  the  Hessian  form  of  the  second  derivative  test  to  show / has  a relative  maximum  at  ( — 2,  2)  and  a 
saddle  point  at  (0,  0). 

In  Exercises  10-13,  find  the  critical  points  of  f if  any,  and  classify  them  as  relative  maxima,  relative  minima,  or 
saddle  points. 


13- /0,  7)  =x-'  -3 xy-y3 
Answer: 

Critical  points:  (-1,  1),  relative  maximum;  (0,  0),  saddle  point 

14-  / (*,  y ) = - 3 xy  + y3 

15- f(x,y]j  = x2  + 2y2-x2y 

Answer: 


Critical  points:  (0,  0),  relative  minimum;  (2,  1)  and  (-2,  1),  saddle  points 

16,  / (x,  7)  = x3  +y3  - 3x  - 3y 

17.  A rectangle  whose  center  is  at  the  origin  and  whose  sides  are  parallel  to  the  coordinate  axes  is  to  be  inscribed 
in  the  ellipse  x + 25 y = 25.  Use  the  method  of  Example  2 to  find  nonnegative  values  of  v and  y that 

produce  the  inscribed  rectangle  with  maximum  area. 

Answer: 


Comer  points:  x “ y ~~ 

Suppose  that  the  temperature  at  a point  (xty)  on  a metal  plate  is  T{x,  y ) = 4x~  — Axy  I y^.  An  ant,  walking 

on  the  plate,  traverses  a circle  of  radius  5 centered  at  the  origin.  What  are  the  highest  and  lowest  temperatures 
encountered  by  the  ant? 

(a)  Show  that  the  functions 

f (x,y'j  = x4+yA  and  =x4-y4 

have  a critical  point  at  (0,  0)  but  the  second  derivative  test  is  inconclusive  at  that  point. 

(b)  Give  a reasonable  argument  to  show  that / has  a relative  minimum  at  (0,  0)  and  g has  a saddle  point  at  (0, 

0). 


20.  Suppose  that  the  Hessian  matrix  of  a certain  quadratic  form  f (x,  y)  is 


What  can  you  say  about  the  location  and  classification  of  the  critical  points  of /? 

21.  Suppose  that  ,4  is  an  ^ x n symmetric  matrix  and 

?(x)  = xr^x 

where  x is  a vector  in  Rn  that  is  expressed  in  column  form.  What  can  you  say  about  the  value  of  q if  x is  a unit 
eigenvector  corresponding  to  an  eigenvalue  X of  A? 


Answer: 

q(x)  = A 


22.  Prove:  If  XJ  Ax  is  a quadratic  form  whose  minimum  and  maximum  values  subject  to  the  constraint  ||x||  = 1 


are  m and  M,  respectively,  then  for  each  number  c in  the  interval  m < c < M,  there  is  a unit  vector  xc  such  that 
= c.  [Hint:  In  the  case  where  m<  let  nm  and UM  be  unit  eigenvectors  of  A such  that  ulAi™  = m 

C C * 771  771 

and  u^An  m = M,  and  let 


= J ¥:  c + J 5 m um 

F M —m  t M — m 


Show  that  \^Axc  = c-] 


True-False  Exercises 


In  parts  (a)-(e)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  A quadratic  form  must  have  either  a maximum  or  minimum  value. 

Answer: 

False 

(b)  The  maximum  value  of  a quadratic  form  x T Ax  subject  to  the  constraint  ||x||  = 1 occurs  at  a unit  eigenvector 
corresponding  to  the  largest  eigenvalue  of^. 

Answer: 

True 

(c)  The  Hessian  matrix  of  a function / with  continuous  second-order  partial  derivatives  is  a symmetric  matrix. 
Answer: 

True 

(d)  If  (Xq?  ^yg)  is  a critical  point  of  a function / and  the  Hessian  of/at  (xq,  7o)  0?  then / has  neither  a relative 

maximum  nor  a relative  minimum  at  (xo,  ^o)  * 

Answer: 

False 

(e)  If  A is  a symmetric  matrix  and  det  A < 0?  then  the  minimum  of  x J Ax  subject  to  the  constraint  ||x||  = 1 is 
negative. 

Answer: 

True 
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7.5  Hermitian,  Unitary,  and  Normal  Matrices 

We  know  that  every  real  symmetric  matrix  is  orthogonally  diagonalizable  and  that  the  real  symmetric  matrices 
are  the  only  orthogonally  diagonalizable  matrices.  In  this  section  we  will  consider  the  diagonalization  problem 
for  complex  matrices. 


Hermitian  and  Unitary  Matrices 

The  transpose  operation  is  less  important  for  complex  matrices  than  for  real  matrices.  A more  useful  operation 
for  complex  matrices  is  given  in  the  following  definition. 


DEFINITION  1 

If  A is  a complex  matrix,  then  the  conjugate  transpose  of  A,  denoted  by  A \ is  defined  by 

A*  = AT  (1) 


Since  part  ( b ) of  Theorem  5.3.2  states  that  — (A ) , the  order  in  which  the  transpose  and 

conjugation  operations  are  performed  in  computing  4 * _ 1 does  not  matter.  Moreover,  in  the  case  where  A 

has  real  entries  we  have  A = (A)  = A 1 , so  A * is  the  same  as  A ^ f°r  real  matrices. 


EXAMPLE  1 Conjugate  Transpose 


Find  the  conjugate  transpose  A * of  the  matrix 


A = 


1 + i — i 0 

2 3 — 2i  i 


Solution  We  have 


1 — i 

2 

1 — i i 0 

2 3 + 2 i —i 

* -T 

and  hence  A = A = 

i 

0 

3 + 2i 

— i 

The  following  theorem,  parts  of  which  are  given  as  exercises,  shows  that  the  basic  algebraic 
properties  of  the  conjugate  transpose  operation  are  similar  to  those  of  the  transpose  (compare  to 
Theorem  1.4.8). 


THEOREM  7.5.1 


If  A-  is  a complex  scalar,  and  if  A,  B,  and  C are  complex  matrices  whose  sizes  are  such  that  the  stated 
operations  can  be  performed,  then: 

(a) 


(b) 

(c) 

(d) 

(e) 


K) 

[a-b)  =a'-b* 


— * 

= kA 


H = 

(ab)'=b'a' 


Note  that  the  relationship  u • v = v ‘ u in  Formula  5 of  Section  5.3  can  be  expressed  in  terms  of  the 
conjugate  transpose  as 


u • V = vu 


(2) 


We  are  now  ready  to  define  two  new  classes  of  matrices  that  will  be  important  in  our  study  of  diagonalization 
in  C”. 


DEFINITION  2 

A square  complex  matrix  A is  said  to  be  unitary  if 

1 * 

A~l=A 


and  is  said  to  be  Hermitian 


if 


(3) 


(4) 


Note  that  a unitary  matrix  can  also  be  defined 


as  a square  complex  matrix  A for  which 
* * 

AA  =A  A = I 


If  A is  a real  matrix,  then  A*  = A^,  in  which  case  3 becomes  ^ -1  — and  4 becomes  A^  — A-  Thus,  the 
unitary  matrices  are  complex  generalizations  of  the  real  orthogonal  matrices  and  Hermitian  matrices  are 
complex  generalizations  of  the  real  symmetric  matrices. 

EXAMPLE  2 Recognizing  Hermitian  Matrices 


Hermitian  matrices  are  easy  to  recognize  because  their  diagonal  entries  are  real  (why?),  and  the 
entries  that  are  symmetrically  positioned  across  the  main  diagonal  are  complex  conjugates.  Thus, 
for  example,  we  can  tell  by  inspection  that 


A = 


1 


—2 


1 — 2 


2 1 4-2 
-5  2-2 

2 + 2 3 


is  Hermitian. 


The  fact  that  real  symmetric  matrices  have  real  eigenvalues  is  a special  case  of  the  following  more  general 
result  about  Hermitian  matrices,  the  proof  of  which  is  left  for  the  exercises. 


THEOREM  7.5.2 

The  eigenvalues  of  a Hermitian  matrix  are  real  numbers. 


The  fact  that  eigenvectors  from  different  eigenspaces  of  a real  symmetric  matrix  are  orthogonal  is  a special 
case  of  the  following  more  general  result  about  Hermitian  matrices. 


THEOREM  7.5.3 

If  A is  a Hermitian  matrix,  then  eigenvectors  from  different  eigenspaces  are  orthogonal. 


Let  vi  and  V2  be  eigenvectors  of  A corresponding  to  distinct  eigenvalues  X\  and  A2.  Using  Formula  2 
and  the  facts  that  Ai  = Ai , A2  = A2,  and  A = +*  we  can  write 


Al(v2  • vi)  = (Aivi)*v2 


04vi)*v2=  (v*J*)v2 

(v^4jv2  = vi  (Av2) 

vi  (A2v2)  = A2  (vj  v2 ) = A2 (v2  • V! ) 


This  implies  that  (Ai  — A2)  (v2  • vi)  = 0 and  hence  that  v2  • vi  = 0 (since  Ai  * A2). 

EXAMPLE  3 Eigenvalues  and  Eigenvectors  of  a Hermitian  Matrix 

Confirm  that  the  Hermitian  matrix 


has  real  eigenvalues  and  that  eigenvectors  from  different  eigenspaces  are  orthogonal. 
The  characteristic  polynomial  of  A is 


so  the  eigenvalues  of  A are  \ = 1 and  \ = 4,  which  are  real.  Bases  for  the  eigenspaces  of  A can  be  obtair 
by  solving  the  linear  system 


det(A/  — A) 


A — 2 -l-i 
— 1 +i  A—  3 

(A— 2)(A  — 3)  — (— 1— 1)(— 1+0 

(A2  — 5A  + 6)  — 2 = (A—  1 ) (A  — 4 ) 


A — 2 -1-i  *1  = 0 

-1  +i  A—  3 *2  “ 0 


with  A = 1 and  with  A = 4-  We  leave  it  for  you  to  do  this  and  to  show  that  the  general  solutions  of  these 
systems  are 


Thus,  bases  for  these  eigenspaces  are 


A = 1 : vj  = 


1 


1 


The  vectors  vj  and  v2  are  orthogonal  since 


and  hence  all  scalar  multiples  of  them  are  also  orthogonal. 


Unitary  matrices  are  not  usually  easy  to  recognize  by  inspection.  However,  the  following  analog  of  Theorems 
7.1.1  and  7.1.3,  part  of  which  is  proved  in  the  exercises,  provides  a way  of  ascertaining  whether  a matrix  is 


unitary  without  computing  its  inverse. 


THEOREM  7.5.4 

If  A is  an  ^ x n matrix  with  complex  entries,  then  the  following  are  equivalent. 

(a)  A is  unitary. 

(b)  ||Jx||  = ||x||  for  all  x in  C”. 

(c)  As  • Ay  = x • y for  all  x and  y in  Cn- 

(d)  The  column  vectors  of  A form  an  orthonormal  set  in  Cn  with  respect  to  the  complex  Euclidean 
inner  product. 

(e)  The  row  vectors  of  A form  an  orthonormal  set  in  Cn  with  respect  to  the  complex  Euclidean  inner 
product. 


EXAMPLE  4 A Unitary  Matrix 

Use  Theorem  7.5.4  to  show  that 

A = 

is  unitary,  and  then  find  A -1  • 

We  will  show  that  the  row  vectors 


[Ifl 

1 

2 1 j 

1 if-l+il 

_2l  J 

2 \ J 

ri  = ^(t+0  +0  and  r2  = ^(l-i)  ^(-1+i) 

are  orthonormal.  The  relevant  computations  are 

f 


Hull 

llrill  = 

rl  ' r2  = 


I 


*<'-o 


^(1+0 


2(_1+i) 


= J 174=1 
12  2 

12  2 


(2°  +i))(F^]+  (2°  +0)(2(_1  +0) 

= (2 (1  0)(i 0 _hi))  (2 (1  H_i))(2 (_1  “ °)  = 2^  “ 2^ = ° 

Since  we  now  know  that  A is  unitary,  it  follows  that 


1 

2 


1 

2 


You  can  confirm  the  validity  of  this  result  by  showing  that  AA  * = A * A = /• 


Unitary  Diagonalizability 


Since  unitary  matrices  are  the  complex  analogs  of  the  real  orthogonal  matrices,  the  following  definition  is  a 
natural  generalization  of  orthogonal  diagonalizability  for  real  matrices. 


DEFINITION  3 


A square  complex  matrix  is  said  to  be  unitarily  diagonalizable  if  there  is  a unitary  matrix  P such  that 
p * AP  = £}  is  a complex  diagonal  matrix.  Any  such  matrix  P is  said  to  unitarily  diagonalize  A. 


Recall  that  a real  symmetric  n x n matrix  A has  an  orthonormal  set  of  n eigenvectors  and  is  orthogonally 
diagonalized  by  any  n x n matrix  whose  column  vectors  are  an  orthonormal  set  of  eigenvectors  of  A.  Here  is 
the  complex  analog  of  that  result. 

THEOREM  7.5.5 

Every  ^ x n Hermitian  matrix  A has  an  orthonormal  set  of  n eigenvectors  and  is  unitarily  diagonalized 
by  any  n x n matrix  P whose  column  vectors  form  an  orthonormal  set  of  eigenvectors  of  A. 

The  procedure  for  unitarily  diagonalizing  a Hermitian  matrix  A is  exactly  the  same  as  that  for  orthogonally 
diagonalizing  a symmetric  matrix: 


J 


Unitarily  Diagonalizing  a Hermitian  Matrix 


Step  1.  Find  a basis  for  each  eigenspace  of  A. 

Step  2.  Apply  the  Gram-Schmidt  process  to  each  of  these  bases  to  obtain  orthonormal  bases  for  the 
eigenspaces. 


Step  3.  Form  the  matrix  P whose  column  vectors  are  the  basis  vectors  obtained  in  Step  2.  This  will 
be  a unitary  matrix  (Theorem  7.5.4)  and  will  unitarily  diagonalize  A. 


EXAMPLE  5 Unitary  Diagonalization  of  a Hermitian  Matrix 

Find  a matrix  P that  unitarily  diagonalizes  the  Flermitian  matrix 


We  showed  in  Example  3 that  the  eigenvalues  of  A are  \ = ] and  A = 4 and  that  bases 
for  the  corresponding  eigenspaces  are 


A = 1 : vi  = 


and  A = 4 : V2 


1 


Since  each  eigenspace  has  only  one  basis  vector,  the  Gram-Schmidt  process  is  simply  a matter  of 
normalizing  these  basis  vectors.  We  leave  it  for  you  to  show  that 


— 1 — i 

‘ 1+i  ‘ 

Pl_  llvill  “ 

i 

“d  P2=  IM  = 

fe 

2 

fs 

Thus,  A is  unitarily  diagonalized  by  the  matrix 


P=  [Pi  P2]  = 


— 1 — i 1 -Ft 

{l  f(, 

J_  JL 


Although  it  is  a little  tedious,  you  may  want  to  check  this  result  by  showing  that 

— 1 ± i J_ 

fi  A 

1 —i  2 


P AP  = 


\j~6  / 6 


' -1  —i 

1 Ml 

2 1 + i 

A 

ft 

'1  O' 

J-J  3 

1 

2 

_0  4 

. ^ 

f6 

Skew-Symmetric  and  Skew-Hermitian  Matrices 

In  Exercise  37  of  Section  1.7  we  defined  a square  matrix  with  real  entries  to  be  skew-symmetric  if  Jtj  = — A- 
A skew-symmetric  matrix  must  have  zeros  on  the  main  diagonal  (why?),  and  each  entry  off  the  main  diagonal 


must  be  the  negative  of  its  mirror  image  about  the  main  diagonal.  Here  is  an  example. 


A = 


0 1 
-1  0 
2 -4 


We  leave  it  for  you  to  confirm  that  A^  = - A- 


-2 

4 

0 


[ skew  — symmetric  ] 


The  complex  analogs  of  the  skew-symmetric  matrices  are  the  matrices  for  which  A * = — A ■ Such  matrices  are 
said  to  be  skew-Hermitian. 


Since  a skew-Hermitian  matrix  A has  the  property 


it  must  be  that  A has  zeros  or  pure  imaginary  numbers  on  the  main  diagonal  (why?),  and  that  the  complex 
conjugate  of  each  entry  off  the  main  diagonal  is  the  negative  of  its  mirror  image  about  the  main  diagonal.  Here 
is  an  example. 


A = 


i 1 — i 5 
— 1 — j 2 i i 
-5  i 0 


[ skew  — Hermitian  ] 


Normal  Matrices 

Hermitian  matrices  enjoy  many,  but  not  all,  of  the  properties  of  real  symmetric  matrices.  For  example,  we 

know  that  real  symmetric  matrices  are  orthogonally  diagonalizable  and  Hermitian  matrices  are  unitarily 

diagonalizable.  However,  whereas  the  real  symmetric  matrices  are  the  only  orthogonally  diagonalizable 

matrices,  the  Hermitian  matrices  do  not  constitute  the  entire  class  of  unitarily  diagonalizable  complex  matrices; 

that  is,  there  exist  unitarily  diagonalizable  matrices  that  are  not  Hermitian.  Specifically,  it  can  be  proved  that  a 

square  complex  matrix  A is  unitarily  diagonalizable  if  and  only  if 

* * 

AA  =A  A 

Matrices  with  this  property  are  said  to  be  normal.  Normal  matrices  include  the  Hermitian,  skew-Hermitian, 
and  unitary  matrices  in  the  complex  case  and  the  symmetric,  skew-symmetric,  and  orthogonal  matrices  in  the 
real  case.  The  nonzero  skew-symmetric  matrices  are  particularly  interesting  because  they  are  examples  of  real 
matrices  that  are  not  orthogonally  diagonalizable  but  are  unitarily  diagonalizable. 


A Comparison  of  Eigenvalues 

We  have  seen  that  Hermitian  matrices  have  real  eigenvalues.  In  the  exercises  we  will  ask  you  to  show  that  the 
eigenvalues  of  a skew-Hermitian  matrix  are  either  zero  or  purely  imaginary  (have  real  part  of  zero)  and  that  the 
eigenvalues  of  unitary  matrices  have  modulus  1.  These  ideas  are  illustrated  schematically  in  Figure  7.5.1. 


Pure  imaginary 
eigenvalues 
(skew-Hermitian) 

|A|  = 1 (unitary) 

1 * 

► 

Real  eigenvalues 
(Hermitian) 


Figure  7.5.1 


Concept  Review 

Conjugate  transpose 
Unitary  matrix 
Hermitian  matrix 
Unitarily  diagonalizable  matrix 
Skew-symmetric  matrix 
Skew-Hermitian  matrix 
Normal  matrix 

Skills 

Find  the  conjugate  transpose  of  a matrix. 

Be  able  to  identify  Hermitian  matrices. 

Find  the  inverse  of  a unitary  matrix. 

Find  a unitary  matrix  that  diagonalizes  a Hermitian  matrix. 


Exercise  Set  7.5 

In  Exercises  1-2,  find  A* . 


1.  2i  1 — i 

A=  4 3 +i 

5 + i 0 


Answer: 


A 


* 


-2 i 4 5-i 

1 i 3 — i 0 


2 i 1 — i — 1 + i 

4 5 — li  -i 

In  Exercises  3-4,  substitute  numbers  for  the  x's  so  that  A is  Hermitian. 

3.  1 i 2 — 3i 

A=  x -3  1 

x x 2 

Answer: 

1 i 2 — 3i 
A=  -i  -3  1 

2 + 3 i 1 2 

4.  [ 2 0 3 + 5j 

A=  x —4  —i 

x x 6 

In  Exercises  5-6,  show  that  A is  not  Hermitian  for  any  choice  of  the  x's. 

5-  (a)  1 i 2 — 3i 

A=  -i  -3  x 

2 — 3i  x x 

(b)  x x 3 + 5i 

A=  0 i —i 

3 — 5 i i x 

Answer: 


In  Exercises  9-12,  show  that  A is  unitary,  and  find  J\  * . 


Answer: 


In  Exercises  13-18,  find  a unitary  matrix  P that  diagonalize 


Answer: 


, and  determine  p 1 AP . 


— 1 + i 


& 


14. 


A = 


3 

i 


—i 

3 


15. 


A = 


6 

2-2  i 


2 + 2 i 
4 


D = 


3 0 

0 6 


Answer: 


ft 


1 \-i 


f3 

/3 


D = 


2 0 
0 8 


16. 


A = 


17. 


i4  = 


0 3 + j 

3 — i -3 

5 0 

0 -1 
0 -l-i 


0 

1 +i 
0 


Answer: 


P = 


1 I i 

ft 


0 1 


-2  0 0 
0 1 0 
0 0 5 


18. 


2 


A = 


f2 

f2 


-H 

{2 

2 

0 


"+1 

f2 

0 


In  Exercises  19-20,  substitute  numbers  for  the  x's  so  that  A is  skew-Hermitian. 

19.  0 i 2 — 3i 

A=  x 0 1 

xx  4i 


Answer: 


0 


A = 


20. 


A = 


i 2 — 3 : 

0 1 
-2-3 : -1  4 : 

0 0 3-5:' 

x 0 — i 

xx  0 


In  Exercises  21-22,  show  that  A is  not  skew-Hermitian  for  any  choice  of  the  x's. 


21. 


(a) 


A = 


(b) 


A = 


0 : 2 — 3 : 

— i Ox 

2 + 3:  x x 

1 x 3-5: 

x 2:  — : 

— 3 + 5:  i 3 i 


Answer: 


(a)  a13*  -«31 

(b)  all  * -^TT 


22. 


(a) 


A = 


(b) 


A = 


i x 2 — 3i 

x 0 1 +i 

2 + 3i  — 1 — i x 

0 — i 4 + 7i 

x Ox 
-4-7:  x 1 


In  Exercises  23-24,  verify  that  the  eigenvalues  of  the  skew-Hermitian  matrix  A are  pure  imaginary  numbers. 


23. 


A = 


24. 


A = 


0 -1+: 
1 4- : : 

0 3 i 
3 i 0 


In  Exercises  25-26,  show  that  A is  normal. 


25. 

1 =H  2i 

2 + i 

-2-:" 

A = 

2 + i 

1 +i 

— : 

—2  “i 

— i 

1 + : 

26. 

2 + 2 i 

1 -: 

A = 

i 

“2i 

1-3: 

1 -X 

1 — 3j 

-3  + 8: 

27.  Show  that  the  matrix 


28. 

29. 


30. 


31. 


32. 


33. 


34. 

35. 


36. 


37. 


38. 


39. 


is  unitary  for  all  real  values  of  0.  [Note:  See  Formula  17  in  Appendix  B for  the  definition  of 

Prove  that  each  entry  on  the  main  diagonal  of  a skew-FIermitian  matrix  is  either  zero  or  a pure  imaginary 
number. 

Let  A be  any  nxn  matrix  with  complex  entries,  and  define  the  matrices  B and  C to  be 

M(^*) 

(a)  Show  that  B and  C are  Flermitian. 

(b)  Show  that  A = B + iC  and  A*  = B - iC- 

(c)  What  condition  must  B and  C satisfy  for  A to  be  normal? 

Answer: 


(c)  B and  C must  commute. 

Show  that  if  A is  an  ^ x n matrix  with  complex  entries,  and  if  u and  v are  vectors  in  Cn  that  are  expressed 
in  column  form,  then 

* * 

.du  • v = u • A v and  u • A\  = A u • v 

Show  that  if  A is  a unitary  matrix,  then  so  is  A * ■ 

Show  that  the  eigenvalues  of  a skew-FIermitian  matrix  are  either  zero  or  purely  imaginary. 

Show  that  the  eigenvalues  of  a unitary  matrix  have  modulus  1 . 

Show  that  if  u is  a nonzero  vector  in  Cn  that  is  expressed  in  column  form,  then  p — uu  * is  Flermitian. 

Show  that  if  u is  a unit  vector  in  C”  that  is  expressed  in  column  form,  then  H = l — 2uu  is  Flermitian  and 
unitary. 

What  can  you  say  about  the  inverse  of  a matrix  A that  is  both  Flermitian  and  unitary? 

Find  a 2 x 2 matrix  that  is  both  Hermitian  and  unitary  and  whose  entries  are  not  all  real  numbers. 


Answer: 


\[2  \[2 

i 1_ 

{2  {2 

Under  what  conditions  is  the  following  matrix  normal? 

~a  0 0 
A= | 0 0 c 

0 b 0 


What  geometric  interpretations  might  you  reasonably  give  to  multiplication  by  the  matrices  p — uu  and 


H = 1 — 2uu*  in  Exercises  34  and  35? 


Answer: 

Multiplication  of  x by  P corresponds  to  ||u|  “ times  the  orthogonal  projection  of  x onto  W = span  {u}  . If 
||u||  = 1 , then  multiplications  of  x by  H — / _ 2uu  corresponds  to  reflection  of  x about  the  hyperplane  u 


40. 


Prove  that  if  A is  an  invertible  matrix,  then  A * is  invertible,  and  [A  J = (a  * ) . 

41*  (a)  Prove  that  (A)  = det(^4). 

(b)  Use  the  result  in  part  (a)  and  the  fact  that  a square  matrix  and  its  transpose  have  the  same  determinant 
to  prove  that  det  (a  J = det(.i4) . 


42.  Use  part  (b)  of  Exercise  41  to  prove: 

(a)  If  A is  Hermitian,  then  dct(  4 ) is  real. 

(b)  If  A is  unitary,  then  |det(j4)  | = 1 . 

43.  Use  properties  of  the  transpose  and  complex  conjugate  to  prove  parts  ( a ) and  (e)  of  Theorem  7.5.1. 

44.  Use  properties  of  the  transpose  and  complex  conjugate  to  prove  parts  ( b ) and  (d)  of  Theorem  7.5.1. 

45.  Prove  that  an  « x n matrix  with  complex  entries  is  unitary  if  and  only  if  the  columns  of  A form  an 
orthonormal  set  in  Cn- 

46.  Prove  that  the  eigenvalues  of  a Hermitian  matrix  are  real. 

True-False  Exercises 


In  parts  (a)-(e)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a) 


The  matrix 

Answer: 

False 


0 i 

1 2 


is  Hermitian. 


(b) 


The  matrix 


{2  {l  {3 

0 J_ 

fe  f3 

i i i 

{2  {I  {2 


is  unitary. 


Answer: 

False 

(c)  The  conjugate  transpose  of  a unitary  matrix  is  unitary. 


Answer: 


True 

(d)  Every  unitarily  diagonalizable  matrix  is  Hermitian. 

Answer: 

False 

(e)  A positive  integer  power  of  a skew-Hermitian  matrix  is  skew-Hermitian. 
Answer: 

False 
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Supplementary  Exercises 


1.  Verify  that  each  matrix  is  orthogonal,  and  find  its  inverse. 


(a) 


(b) 


4 

'5 

3 

5 


4 

0 

3 

5 

5 

9 

4 

12 

25 

5 

25 

12 

3 

16 

25 

5 

25 

Answer: 


(a) 


5 

4 

5 


4 

5 
3 
5 


l”1 

3 4' 

5 5 

4 3 

5 5 

(b) 


4 

5 
_9_ 
25 
12 
25 


0 

3" 

-1 

4 

9 

12" 

5 

5 

25 

25 

4 

12 

0 

4 

3 

5 

25 

5 

5 

3 

16 

3 

12 

16 

5 

25 

5 

25 

25 

2.  Prove:  If  Q is  an  orthogonal  matrix,  then  each  entry  of  Q is  the  same  as  its  cofactor  if  det(0  = 1 and  is 
the  negative  of  its  cofactor  if  det(0  = — 1 . 

3.  Prove  that  if  A is  a positive  definite  symmetric  matrix,  and  if  u and  v vectors  in  Rn  in  column  form,  then 

(u,  v}  = u7'^4v 

is  an  inner  product  on  Rn. 

4.  Find  the  characteristic  polynomial  and  the  dimensions  of  the  eigenspaces  of  the  symmetric  matrix 

'3  2 2' 

2 3 2 
2 2 3 


5.  Find  a matrix  P that  orthogonally  diagonalizes 


A = 


1 0 1 
0 1 0 
1 0 1 


and  determine  the  diagonal  matrix  q _ p T j^p. 


Answer: 


L_  _L_  0 

]f2  To  0 O' 

P=  0 0 1;  PtAP=  0 2 0 

J_  J_  n 001 

fl  f2  j 

6.  Express  each  quadratic  form  in  the  matrix  notation  x ^ Ax- 

(a)  — 4xj  4-  16^2  — 1.5x\X2 

(b)  9xj  — Xj  4-  4x|  4-  6x1x2  — 8x1x3  4-  X2X3 

7.  Classify  the  quadradic  form 

x2  — 3xiX2  + 4x| 

as  positive  definite,  negative  definite,  indefinite,  positive  semidefinite,  or  negative  semidefinite. 

Answer: 
positive  definite 

8.  Find  an  orthogonal  change  of  variable  that  eliminates  the  cross  product  terms  in  each  quadratic  form,  and 
express  the  quadratic  form  in  terms  of  the  new  variables. 

(a)  -3xf  4-  5x3  4-  2xiX2 

(b)  — 5x^  4-  x|  — X3  4-  6x1x3  4-  4xiX2 

9.  Identify  the  type  of  conic  section  represented  by  each  equation. 

(a)  y — x2  = 0 

(b)  3x  — 1 ly2  = 0 

Answer: 

(a)  parabola 

(b)  parabola 

10.  Find  a unitary  matrix  U that  diagonalizes 

lfl  1 O' 

A=  Oil 
1 0 1 

and  determine  the  diagonal  matrix  £)  = JJ~^AU- 

11.  Show  that  if  U is  an  n x n unitary  matrix  and 

M = M=  ‘ = ¥n\  = 1 


then  the  product 


z\  0 0 • ■ • 0 

y 0 Z2  0 • ■ • 0 

0 0 0 • • • zn 

is  also  unitary. 

12.  Suppose  that  A*  = — A- 

(a)  Show  that  iA  is  Hermitian. 

(b)  Show  that  A is  unitarily  diagonalizable  and  has  pure  imaginary  eigenvalues. 
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| CHAPTER 

8 


Linear  Transformations 


CHAPTER  CONTENTS 

General  Linear  Transformations 
Isomorphism 

Compositions  and  Inverse  Transformations 
Matrices  for  General  Linear  Transformations 
Similarity 


INTRODUCTION 

In  Section  4.9  and  Section  4.10  we  studied  linear  transformations  from  Rn  to  Rm.  In  this 
chapter  we  will  define  and  study  linear  transformations  from  a general  vector  space  V to  a 
general  vector  space  W.  The  results  we  obtain  here  have  important  applications  in  physics, 
engineering,  and  various  branches  of  mathematics. 
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8.1  General  Linear  Transformations 

Up  to  now  our  study  of  linear  transformations  has  focused  on  transformations  from  Rn  to  R™.  In  this  section  we 
will  turn  our  attention  to  linear  transformations  involving  general  vector  spaces.  We  will  illustrate  ways  in  which 
such  transformations  arise,  and  we  will  establish  a fundamental  relationship  between  general  ^-dimensional  vector 
spaces  and  Rn. 


Definitions  and  Terminology 

In  Section  4.9  we  defined  a matrix  transformation  T^.Rn  — ► Rm  to  be  a mapping  of  the  form 

Ta(x)=Ax 

in  which  A is  an  m x n matrix.  We  subsequently  established  in  Theorem  4.10.2  and  Theorem  4.10.3  that  the  matrix 
transformations  are  precisely  the  linear  transformations  from  Rn  to  Rm9  that  is,  the  transformations  with  the 
linearity  properties 

r(u  + v)  = T(u)  + r(v)  and  T(ku)=kT(n) 

We  will  use  these  two  properties  as  the  starting  point  for  defining  more  general  linear  transformations. 


DEFINITION  1 

If  X.  V — ► W is  a function  from  a vector  space  V to  a vector  space  W , then  T is  called  a linear 
transformation  from  V to  W if  the  following  two  properties  hold  for  all  vectors  u and  v in  V and  for  all 
scalars  k\ 

(i)  T(ba)  = kT(u)  [Homogeneity  property] 

(ii)  7(u  + v)  = 7(u)  + T(v)  [Additivity  property] 

In  the  special  case  where  y — the  linear  transformation  T is  called  a linear  operator  on  the  vector  space 
V. 


J 


The  homogeneity  and  additivity  properties  of  a linear  transformation  J9-  y _►  W can  be  used  in  combination  to 
show  that  if  v\  and  V2  are  vectors  in  V and  fci  and  ^2  are  any  scalars,  then 

7X*ivi  +&2V2 ) =*l7Xvi)  +k2T(v2) 

More  generally,  if  vi,  V2, vr  are  vectors  in  V and  fcj,  kj,  kr  are  any  scalars,  then 

7X*ivi+*2V2+  - - =A:i7,(vi) +it2^(v2)  + • • • +^T(vr)  (1) 


The  following  theorem  is  an  analog  of  parts  ( a ) and  (d)  of  Theorem  4.9.1. 


THEOREM  8.1.1 


If  J’:  V — ► W is  a linear  transformation,  then: 


(a)  7(0)  =0. 

(b)  7(u  — v)  = 7(u)  — 7(v)  for  all  u and  v in  V. 


Let  u be  any  vector  in  V.  Since  Qu  = 0?  it  follows  from  the  homogeneity  property  in  Definition  1 that 


which  proves  (a). 


T(0)  = T(0u)=0T(u)=0 


We  can  prove  part  ( b ) by  rewriting  7(u  — v)  as 

7(u  — v)  = 7(u+(-l)v) 

= r(u)  + ( - i)r(v) 

= T(U)-T(V) 

We  leave  it  for  you  to  justify  each  step. 


Use  the  two  parts  of  Theorem  8.1.1  to  prove  that 
T(-v)=  -v 

for  all  v in  V. 


EXAMPLE  1 Matrix  Transformations 

Because  we  have  based  the  definition  of  a general  linear  transformation  on  the  homogeneity  and 
additivity  properties  of  matrix  transformations , it  follows  that  a matrix  transformation  Tj{.  Rn  —» R™  is 
also  a linear  transformation  in  this  more  general  sense  with  V = Rn  and  W = R™- 


EXAMPLE  2 The  Zero  Transformation 


Let  V and  W be  any  two  vector  spaces.  The  mapping  7;  y _►  W such  that  7(v)  = 0 for  every  v in  Lis  a 
linear  transformation  called  the  zero  transformation.  To  see  that  T is  linear,  observe  that 
7(u  + v)  = 0,  T(u)  = 0,  7(v)  = 0,  and  T(Jta)  = 0 


Therefore, 


7(u  + v)  = 7(u)  + 7(v)  and  T(hi)=kT(u) 


EXAMPLE  3 The  Identity  Operator 

Let  Lbe  any  vector  space.  The  mapping  [-y  > V defined  by  /(v)  = v is  called  the  identity  operator  on 
V.  We  will  leave  it  for  you  to  verify  that  / is  linear. 


EXAMPLE  4 Dilation  and  Contraction  Operators 


If  V is  a vector  space  and  k is  any  scalar,  then  the  mapping  X:  V — ► V given  by  7Xx)  = kx  is  a linear 
operator  on  V,  for  if  c is  any  scalar  and  if  u and  v are  any  vectors  in  V,  then 

T(cu)  =k(cu)  = c(ku)  =cT(  u) 

T{  u + v)  = k{  u + v)  = hi  + kv  = T(u)  + T(v) 

If  0 < k < 1 5 then  T is  called  the  contraction  of  V with  factor  k , and  if  k > 1 , it  is  called  the  dilation  of  V 
with  factor  k (Figure  8.1.1). 


Figure  8.1.1 


EXAMPLE  5 A Linear  Transformation  from  Pn  to  Pn  + i 

Let  p = p(x)  = cq-(-cix4-  • • • + be  a polynomial  in  Pn,  and  define  the  transformation 

T'-Pn->P»+ 1 by 

7(p  j = T^p(x)  J = xp(x)  =cqx  -¥c\x2  + • • ■ +cMx”+1 

This  transformation  is  linear  because  for  any  scalar  k and  any  polynomials  P l and  P2  in  Pn  we  have 
T(kp)  = T(kp(x))  =x(kp(x))  =k(xp(x))  =kT(  p) 

and 

^(Pl+P2)  = T(j>i(x)  +P2(x))  =x(pi(x)  +P2(x)) 

= xpi(x)+xp2(x)  = T(vi)  + T(v2) 


EXAMPLE  6 A Linear  Transformation  Using  an  Inner  Product 

Let  Lbe  an  inner  product  space,  let  vq  be  any  fixed  vector  in  V,  and  let  X:  V — » R be  the  transformation 

7(x)  = (x,  vq  J 

that  maps  a vector  x into  its  inner  product  with  vq.  This  transformation  is  linear,  for  if  k is  any  scalar,  and 
if  u and  v are  any  vectors  in  V.  then  it  follows  from  properties  of  inner  products  that 

7(&u)  = vo}  = vo}  = £7(u) 

7(u  + v)  =(u  + v,  vq}  = (u,  vq}  + (v,  vq}  = 7(u)  + 7(v) 


EXAMPLE  7 Transformations  on  Matrix  Spaces 


Let  M yin  be  the  vector  space  of  ^ x n matrices.  In  each  part  determine  whether  the  transformation  is 
linear. 

(a)  TiiA^A7 

(b)  T2(A)  = det(^) 

Solution 

It  follows  from  parts  ( b ) and  (cl)  of  Theorem  1.4.8  that 

T i (yh4)  = (kA) 7 = kAT  = kTi(Aj 

Ti(a+b}  = (a+b)t=a7+b7=t1{a}  + t[{b} 

so  T\  is  linear. 

It  follows  from  Formula  1 of  Section  2.3  that 

T2  {kA)  = det  {kA)  = *”det  {A)  = knT2  {A) 

Thus,  T2  is  not  homogeneous  and  hence  not  linear  if  n > 1 • Note  that  additivity  also  fails 
because  we  showed  in  Example  1 of  Section  2.3  that  det(.d  + B)  and  det (^4)  + det (5)  are  not 
generally  equal. 


EXAMPLE  8 Translation  Is  Not  Linear 

Part  (a)  of  Theorem  8.1.1  states  that  a linear  transformation  maps  0 to  0.  This  property  is  useful  for 
identifying  transformations  that  are  not  linear.  For  example,  if  xq  is  a fixed  nonzero  vector  in  then 
the  transformation 

7(x)  =X  + XQ 

has  the  geometric  effect  of  translating  each  point  x in  a direction  parallel  to  xq  through  a distance  of 
||xq||  (Figure  8.1.2).  This  cannot  be  a linear  transformation  since  7X0)  = xq,  so  T does  not  map  0 to  0. 


7(x)=x4xo  translates  each  point  x along  a line  parallel  to  xq  through  a distance 

ll*oll- 


EXAMPLE  9 The  Evaluation  Transformation 


Let  Lbe  a subspace  of  F(  — 00,  00),  let 

x\,x2,...,xn 

be  distinct  real  numbers,  and  let  TV—*  Rn  be  the  transformation 

n/)  = (/0q),/(*2),-,/(*„))  (2) 

that  associates  with  / the  ^-tuple  of  function  values  at  x\,  x2,  xn.  We  call  this  the  evaluation 

transformation  on  V3tx\,x2r...fxy}.  Thus,  for  example,  if 

XI  = - 1,  *2  = 2,  *3=4 

and  if  / (x)  = x2  — 1 , then 

T(f)  = (/(*i),  /(x2),  /(x3))  = (0,  3,  15) 


The  evaluation  transformation  in  2 is  linear,  for  if  k is  any  scalar,  and  if f and  g are  any  functions  in  V, 
then 

T(kf)  = ((*/)(*!).  (*/)(x2) (*/)(*„)) 

= (*/(xi),*/(x2),...,*/(x„)) 

= *(/(*l),/02) /(*„))  =*T(/) 

and 

T(J+g)  = «/+g)(*l).(/+g)(*2) (/+g)(x„)) 

= (/Ol)  + g(*i)./(*2)  + g(*2> f(x„)  + g(xM)) 

= (/(*  l),/(*2) /(*«))  + (g(*l).g(*2).-» .g(*n)) 

= r(/)  + T(g) 


Finding  Linear  Transformations  from  Images  of  Basis  Vectors 

We  saw  in  Formula  (12)  of  Section  4.9  that  if  T\Rn  — ► Z?™  is  a matrix  transformation,  say  multiplication  by  ^4,  and 
if  ei,  e2,  - e„  are  the  standard  basis  vectors  for  Rn,  then  A can  be  expressed  as 

A=[T(e{)\T(e2)\-  ■ • |T(e„)] 

It  follows  from  this  that  the  image  of  any  vector  v = (c  \ , c2,  - . cn)  in  Rn  under  multiplication  by  A can  be 
expressed  as 

T(?)  =c\T{e{)  +c2T(e2)  + • • • +c„T(e„) 

This  formula  tells  us  that  for  a matrix  transformation  the  image  of  any  vector  is  expressible  as  a linear  combination 
of  the  images  of  the  standard  basis  vectors.  This  is  a special  case  of  the  following  more  general  result. 


THEOREM  8.1.2 


Let  T\V  —*W  be  a linear  transformation,  where  V is  finite  dimensional.  If  S = { vi , V2,  - - v„  } is  a basis 


for  V,  then  the  image  of  any  vector  v in  V can  be  expressed  as 


T(v)  =ci7’(vi)  +c2T(v2)  + • • • +cm7(v„) 


(3) 


where  ci,  c 2, cn  are  the  coefficients  required  to  express  v as  a linear  combination  of  the  vectors  in  S. 


Express  v as  v = c \ vi  H-  C2V2  + " " " + cnYn  and  use  the  linearity  of  T. 


EXAMPLE  1 0 Computing  with  Images  of  Basis  Vectors 

Consider  the  basis  S = (vi,v2,  V3}  lor  where 

vi  = (1.1.1).  v2  = (U.O).  v3  = 0.0.0) 

Let  7* R J ^ be  the  linear  transformation  for  which 

Tiy  1)  = (1,  0),  7(v2)  = (2,  - 1),  7(v3)  = (4,  3) 

Find  a formula  for  7(xi,  x2,  x3),  and  then  use  that  formula  to  compute  7(2,  — 3,  5). 


We  first  need  to  express  x = (xj,  x2,  x3)  as  a linear  combination  of  vj . v2,  and  v3.  If  we 

write 

(*1.  *2.  *3)  =ci(l.  1. 1)  +C20. 1.  0)  +c3(l.  0,  0) 

then  on  equating  corresponding  components,  we  obtain 

C\+C2  + C2  = *i 

c\  +C2  = X2 

ci  = X3 

which  yields  c\  = x3,  C2  = *2  _ x2,  c2  = x l — x2,  so 

O1.x2.x3)  = x3(l,  1, 1)  + (x2-x3)(l,  1,0)  + (xi -X2)(l,  0,  0) 

= x3vj  + (x2-x3)v2  + (X!  -X2)v3 


Thus 


7(xi,x2,x3) 


From  this  formula,  we  obtain 


x37(vi)  + (x2-x3)7(v2)  + (xi  — x2)7(v3) 
x3(l,  0)  + (x2  - x3) (2,  - 1)  + (xi  - x2) (4,  3) 
(4xi  -2x2-x3,  3xi  -4x2  + x3) 


7(2,  -3,  5)  = (9,23) 


CALCULUS  REQUIRED 

EXAMPLE  11  A Linear  Transformation  from  C^-00,  °°)  to  F(-°°,  °°) 

Let  V = C * | — 00,  00  J be  the  vector  space  of  functions  with  continuous  first  derivatives  on  ( — 00,  00) , and  let 

W = F ( — 00,  00)  be  the  vector  space  of  all  real- valued  functions  defined  on  ( — oo,  oo) . Let  £);  Y _►  W be  the 
transformation  that  maps  a function  f = / (x)  into  its  derivative — that  is, 


2(f) =/'(*) 


From  the  properties  of  differentiation,  we  have 

D(f  + g)  = D(H)  = kD({)  and  £>(f)+£>(g) 
Thus,  D is  a linear  transformation. 


CALCULUS  REQUIRED 

EXAMPLE  12  An  Integral  Transformation 


Let  V = C(  — oo,  oq)  be  the  vector  space  of  continuous  functions  on  the  interval  ( — oo,  00) , let 
W=  — 00,  00  j be  the  vector  space  of  functions  with  continuous  first  derivatives  on  ( — 00,  00),  and 

let  J\  V — ► W be  the  transformation  that  maps  a function /in  V into 

2 

For  example,  if  f (x)  = x , then 


j(f) 


-L 


*2  t3 
rdt  = l— 


The  transformation  J-  y — » is  linear,  for  if  k is  any  constant,  and  if f and  g are  any  functions  in  V.  then 

properties  of  the  integral  imply  that 


= k f 

JO 


J{kf)=  j\f{t)dt  = k{  f{t)dt  = kJtf) 

J(f+g)=  f (f(t)+g{t))dt=  I f(t)dt+  I g(t)dt  = J(f)  + J(g) 


= jf(t)dt+  f 

Jo  Jo 


Kernel  and  Range 

Recall  that  if  A is  an  ^ x n matrix,  then  the  null  space  of  A consists  of  all  vectors  x in  Rn  such  that  Ax  = 0?  and  by 
Theorem  4.7.1  the  column  space  of  ,4  consists  of  all  vectors  b in  Rm  for  which  there  is  at  least  one  vector  x in  Rn 
such  that  Ax  = b-  From  the  viewpoint  of  matrix  transformations,  the  null  space  of  A consists  of  all  vectors  in  Rn 
that  multiplication  by  A maps  into  0,  and  the  column  space  of  A consists  of  all  vectors  in  Rm  that  are  images  of  at 
least  one  vector  in  Rn  under  multiplication  by  A.  The  following  definition  extends  these  ideas  to  general  linear 
transformations. 


DEFINITION  2 

If  J7;  y _►  is  a linear  transformation,  then  the  set  of  vectors  in  V that  T maps  into  0 is  called  the  kernel  of 
T and  is  denoted  by  ker(/) . The  set  of  all  vectors  in  W that  are  images  under  T of  at  least  one  vector  in  V is 
called  the  range  of  T and  is  denoted  by  R(t) . 


J 


EXAMPLE  13  Kernel  and  Range  of  a Matrix  Transformation 

If  Tj{.  Rn  — ► R™  is  multiplication  by  the  ^ x n matrix  A,  then,  as  discussed  above,  the  kernel  of  T is 
the  null  space  of  A,  and  the  range  of  Tj\  is  the  column  space  of  A. 


EXAMPLE  14  Kernel  and  Range  of  the  Zero  Transformation 

Let  TV—+W  be  the  zero  transformation.  Since  T maps  every  vector  in  V into  0,  it  follows  that 
ker(^)  = V . Moreover,  since  0 is  the  only  image  under  T of  vectors  in  V,  it  follows  that  R(t)  = {0}  . 


EXAMPLE  15  Kernel  and  Range  of  the  Identity  Operator 

Let  /;  V _►  V be  the  identity  operator.  Since  /(v)  = v for  all  vectors  in  V,  every  vector  in  V is  the  image 
of  some  vector  (namely,  itself);  thus  R(l)  = V . Since  the  only  vector  that  / maps  into  0 is  0,  it  follows 
that  ker(/)  = (0}  . 


EXAMPLE  1 6 Kernel  and  Range  of  an  Orthogonal  Projection 


As  illustrated  in  Figure  8.1.3a,  the  points  that  T maps  into  0 = (0,  0,  0)  are  precisely  those  on  the  z-axis, 
so  ker(^)  is  the  set  of  points  of  the  form  (0,  0,  z).  As  illustrated  in  Figure  8.1.36,  T maps  the  points  in 
to  the  xy-plane,  where  each  point  in  that  plane  is  the  image  of  each  point  on  the  vertical  line  above  it. 
Thus,  R(t)  is  the  set  of  points  of  the  form  (A,  yf  0)  • 


A - 


♦ (0.  0. ;) 


(a)  ker(  T)  is  the  c-axis. 


Figure  8.1.3 


EXAMPLE  17  Kernel  and  Range  of  a Rotation 

Let  f g}  be,  the  linear  operator  that  rotates  each  vector  in  the  AT-plane  through  the  angle  (Figure 

8.1.4).  Since  every  vector  in  the  xy-plane  can  be  obtained  by  rotating  some  vector  through  the  angle  0,  it 
follows  that  R(t)  = R . Moreover,  the  only  vector  that  rotates  into  0 is  0,  so  ker(^)  = (0  } . 


Figure  8.1.4 


CALCULUS  REQUIRED 

EXAMPLE  1 8 Kernel  of  a Differentiation  Transformation 

Let  V = C 1 j — oo,  oo } be  the  vector  space  of  functions  with  continuous  first  derivatives  on  ( — oo,  oo) , 

let  W = F ( — oo,  oo)  be  the  vector  space  of  all  real-valued  functions  defined  on  ( — oo,  oo) , and  let 
Q y _►  be  the  differentiation  transformation  D {f  ^ = / r (x) . The  kernel  of  D is  the  set  of  functions  in 
V with  derivative  zero.  From  calculus,  this  is  the  set  of  constant  functions  on  ( — oo,  oo). 


Properties  of  Kernel  and  Range 

In  all  of  the  preceding  examples,  ker(^)  and  R(t)  turned  out  to  be  subspaces.  In  Example  14,  Example  15,  and 
Example  17  they  were  either  the  zero  subspace  or  the  entire  vector  space.  In  Example  16  the  kernel  was  a line 
through  the  origin,  and  the  range  was  a plane  through  the  origin,  both  of  which  are  subspaces  of/?-'.  All  of  this  is  a 
consequence  of  the  following  general  theorem. 


THEOREM  8.1.3 

If  T y _►  is  a linear  transformation,  then: 

(a)  The  kernel  of  T is  a subspace  of  V. 

(b)  The  range  of  T is  a subspace  of  W. 


To  show  that  ker(^)  is  a subspace,  we  must  show  that  it  contains  at  least  one  vector  and  is  closed  under 
addition  and  scalar  multiplication.  By  part  ( a ) of  Theorem  8.1.1,  the  vector  0 is  in  ker(0,  so  the  kernel  contains  at 
least  one  vector.  Let  vi  and  V2  be  vectors  in  ker(T) , and  let  k be  any  scalar.  Then 

T(y\  + V2)  = T(vi)  + T(v2)  =04-0=0 


so  vi  4- v2  is  in  ker(^) . Also, 


T(kv{)=kT(v{)=kO  = 0 


so  £vi  is  in  ker(£). 


Proof  (b)  To  show  that  R(t)  is  a subspace  of  W,  we  must  show  that  it  contains  at  least  one  vector  and  is  closed 
under  addition  and  scalar  multiplication.  However,  it  contains  at  least  the  zero  vector  of  W since  7(0)  = (0)  by 
part  (a)  of  Theorem  8.1.1.  To  prove  that  it  is  closed  under  addition  and  scalar  multiplication,  we  must  show  that  if 
and  W2  are  vectors  in  R(t) , and  if  k is  any  scalar,  then  there  exist  vectors  a and  b in  V for  which 

T(a)=wi+W2  and  7(b)  =Awi  (4) 

But  the  fact  w\  and  ^2  are  in  R(t)  tells  us  that  there  exist  vectors  v\  and  V2  in  V such  that 

7(vi)  =wi  and  7(v2)  = w2 

The  following  computations  complete  the  proof  by  showing  that  the  vectors  a = vi  + V2  and  b =kv\  satisfy  the 
equations  in  4: 

7(  a)  = 7(v  i + V2)  = T(vi)  4-  7(v  2)  =wi  + W2 
7(b)  = T(kv  1)  = £7(vi)  =kw\ 

CALCULUS  REQUIRED 

EXAMPLE  19  Application  to  Differential  Equations 

Differential  equations  of  the  form 

y ff  + = 0 ^ a positive  constant  J (5) 

arise  in  the  study  of  vibrations.  The  set  of  all  solutions  of  this  equation  on  the  interval  ( — 00,  00)  is  the 
kernel  of  the  linear  transformation  £•':  C“  ( — 00,  00  J — ► — 00,  given  by 

D(y)=y"  + u2y 

It  is  proved  in  standard  textbooks  on  differential  equations  that  the  kernel  is  a two-dimensional  subspace 
ofC2(-oo,  00 J , so  that  if  we  can  find  two  linearly  independent  solutions  of  5,  then  all  other  solutions 

can  be  expressed  as  linear  combinations  of  those  two.  We  leave  it  for  you  to  confirm  by  differentiating 
that 

y\=cosiA>x  and  72  = sin utx 

are  solutions  of  5.  These  functions  are  linearly  independent  since  neither  is  a scalar  multiple  of  the  other, 
and  thus 


y = c\cos  mx  + c 2smMX  (6) 

is  a “general  solution”  of  5 in  the  sense  that  every  choice  of  c\  and  c 2 produces  a solution,  and  every 
solution  is  of  this  form. 


Rank  and  Nullity  of  Linear  Transformations 


In  Definition  1 of  Section  4.8  we  defined  the  notions  of  rank  and  nullity  for  an  mxn  matrix,  and  in  Theorem  4.8.2, 
which  we  called  the  Dimension  Theorem , we  proved  that  the  sum  of  the  rank  and  nullity  is  n.  We  will  show  next 
that  this  result  is  a special  case  of  a more  general  result  about  linear  transformations.  We  start  with  the  following 
definition. 


DEFINITION  3 

Let  T.  V — » W be  a linear  transformation.  If  the  range  of  T is  finite-dimensional,  then  its  dimension  is  called 
the  rank  of  T ; and  if  the  kernel  of  T is  finite-dimensional,  then  its  dimension  is  called  the  nullity  of  T.  The 
rank  of  T is  denoted  by  rank(^)  and  the  nullity  of  T by  nullity (t) . 


The  following  theorem,  whose  proof  is  optional,  generalizes  Theorem  4.8.2. 


Dimension  Theorem  for  Linear  Transformations 

If  T y _►  is  a linear  transformation  from  an  ^-dimensional  vector  space  V to  a vector  space  W,  then 

rank(T)  + nullity(T)  = n (7) 


In  the  special  case  where  A is  an  m x n matrix  and  T a Rn  — > Rm  is  multiplication  by  A,  the  kernel  of  7^  is  the  null 
space  of  A,  and  the  range  of  Tj\  is  the  column  space  of  A.  Thus,  it  follows  from  Theorem  8. 1 .4  that 

rank(T 4-  nuUity(T a)  =n 


OPTIONAL 


torem  8. 1.4  We  must  show  that 


dim  (R(t))  + dim(ker(7))  =n 

We  will  give  the  proof  for  the  case  where  1 < dim(ker(2))  < n.  The  cases  where  dim(ker(7))  = 0 and 
dim(ker(7))  = n are  left  as  exercises.  Assume  dim(ker(7))  = A and  let  vj, vr  be  a basis  for  the  kernel.  Since 
{v\, vr}  is  linearly  independent,  Theorem  4.5.56  states  that  there  are  ,>2  _ r vectors,  vr+\, vM,  such  that  the 
extended  set  {vi, vr,  vr+\, vM}  is  a basis  for  V.  To  complete  the  proof,  we  will  show  that  the  ^ — r vectors 
in  the  set  S = {T(vr^  \ ) , . . T{\n) ) form  a basis  for  the  range  of  T.  It  will  then  follow  that 

dim  (R(t))  + dim(ker(0)  = (n  — r)  +r  = n 


First  we  show  that  S spans  the  range  of  T.  If  b is  any  vector  in  the  range  of  T,  then  b = T(y)  for  some  vector  v in 
V.  Since  (vi, vr,  vr_|_i, vM}  is  a basis  for  V9  the  vector  v can  be  written  in  the  form 

v = civi+  • ■ ■ +crYr  + cr+ ivr+i+  - ■ ■ 

Since  v\, vr  lie  in  the  kernel  of  T9  we  have  7Xvl)  = ’ " * =T(vr)=0,so 


b = 7(v)  =cr+i7(vr+i)  + • • • +c„T(v„) 

Thus  S spans  the  range  of  T. 

Finally,  we  show  that  S is  a linearly  independent  set  and  consequently  forms  a basis  for  the  range  of  T.  Suppose  that 
some  linear  combination  of  the  vectors  in  S is  zero;  that  is, 

^+l7(v,+i)+  • • • +£m7(vm)  =0  (8) 


We  must  show  that  kr+\  = - • ■ = kn  = 0.  Since  T is  linear,  8 can  be  rewritten  as 

T(kr+ivr+i+  • • • + = 0 

which  says  that  kr. + • • • 4-  kn\n  is  in  the  kernel  of  T.  This  vector  can  therefore  be  written  as  a linear 
combination  of  the  basis  vectors  {vj, vr)  , say 

*>+lV,+l+  • • • +*«v„  = *iv1+  • • • +*>v, 


Thus, 


*1V1+  • • * + krvr  - kr+ivr+i  - • • • -k„v„  = 0 


Since  {vj, vM}  is  linearly  independent,  all  of  the  k's  are  zero;  in  particular,  Arr-j_i  = 
completes  the  proof 


= kn  = 0,  which 


Concept  Review 

Linear  transformation 
Linear  operator 
Zero  transformation 
Identity  operator 
Contraction 
Dilation 

Evaluation  transformation 

Kernel 

Range 

Rank 

Nullity 

Skills 

Determine  whether  a function  is  a linear  transformation. 

Find  a formula  for  a linear  transformation  J*  y — ► W given  the  values  of  T on  a basis  for  V. 
Find  a basis  for  the  kernel  of  a linear  transformation. 

Find  a basis  for  the  range  of  a linear  transformation. 

Find  the  rank  of  a linear  transformation. 

Find  the  nullity  of  a linear  transformation. 


Exercise  Set  8.1 


In  Exercises  1-8,  determine  whether  the  function  is  a linear  transformation.  Justify  your  answer. 
1 -TV  & where  V is  an  inner  product  space,  and  T(u)  = ||u|| . 

Answer: 

Nonlinear 

2.  T:R^  — ► R?>  where  vq  is  a fixed  vector  in  R?  and  T(u)  = ux  vq- 

3.  T : M22  —*  M23,  where  B is  a fixed  2x3  matrix  and  T{A)  = AB. 

Answer: 

Linear 


4.  T : Mnn  — ► R , where  T(A)  = tr(A) . 
5-F:Mmn^Mnm,  where F f^)  = AT. 


Answer: 


Linear 

6.  T:  M22  — < ► R,  where 


7.  T.P2  —>1*2’  where 


Answer: 


(a)  Linear 

(b)  Nonlinear 


8.  T:  F ( — 00,  00)  — ► F ( — 00,  00) , where 

(a)  T(f(x))  = \+f(x) 

(b)  T(f(x))=f(x  + 1) 


9.  Consider  the  basis  S = {vj,  V2}  for  r}9  where  v\  = (1,  1)  and  V2  = (1,  0),  and  let  T.R?  — ► R?  be  the  linear 
operator  for  which 


7(vi)  = (l,  -2)  and  T(v2)  = (-4,  1) 


Find  a formula  for  T(x\,  X2),  and  use  that  formula  to  find  T( 5,  — 3). 


Answer: 


7(xi.x2)  = (-4xi+5x2.  — 3^2);  7(5,  -3)  = (-35, 14) 

10.  Consider  the  basis  S'  = (vi,  V2}  for  r},  where  vi  = ( — 2,  1)  and  V2  = (1,  3),  and  let  X.R1  — ► R~'  be  the 
linear  transformation  such  that 

7*(vi)  = ( - 1.  2,  0)  and  7(v2)  = (0,  - 3,  5) 

Find  a formula  for  T(x\,  *2),  and  use  that  formula  to  find  7(2,  — 3) . 

11.  Consider  the  basis  S'  = { vi , v2,  V3 } for  R2',  where  vi  = ( 1 , 1 , 1 ) , v2  = ( 1 , 1,0),  and  V3  = ( 1 , 0,  0) , and  let 
T.R?  —*R?  be  the  linear  operator  for  which 

T(y  1)  = (2,  -1,4),  7(v2)  = (3,0,1), 

7(v3)  = (- 1,5,1) 

Find  a formula  for  T(x\,  x2,  x3),  and  use  that  formula  to  find  7(2,  4,  — 1). 

Answer: 

T(x\,x2,  x3)  = (-xi  +4^2 -x3,  -5x2-x3,  xi+3x3);  7(2,4,  - 1)  = (15,  -9,  -1) 

12.  Consider  the  basis  S = (vi,  V2,  v3)  for  r},  where  vi  = (1,  2,  1),  v2  = (2,  9,  0),  and  v3  = (3,  3,  4),  and  let 

T R1'  ■ R2  be  the  linear  transformation  for  which 

7(vi)  = (l,0),  7(v2)  = ( — 1,  1),  7(v3)  = (0,  1) 

Find  a formula  for  T(x\,  x2,  x3),  and  use  that  formula  to  find  7(7,  13,  7). 

13.  Let  vj,  V2,  and  v3  be  vectors  in  a vector  space  V,  and  let  X.  V — » R?  be  a linear  transformation  for  which 

7(vi)  = (1,  -1,2),  7(v2)  = (0,  3,  2), 

r(Y3)  = (-3, 1,2) 

Find  7(2vj  — 3v2  4-  4v3). 

Answer: 

7(2vi  - 3v2  + 4v3)  = ( - 10,  -1,6) 

14.  Let  R R^  R2  be  the  linear  operator  given  by  the  formula 

T(x,y)  = (2x-y,  -8x  + 4.y) 

Which  of  the  following  vectors  are  in  /?(f)? 

(a)  (1.  -4) 

(b)  (5,  0) 

(c)  (-3,12) 

15.  Let  X R 2 . R2  be  the  linear  operator  in  Exercise  14.  Which  of  the  following  vectors  are  in  ker(f)? 

(a)  (5,  10) 

(b)  (3,  2) 

(c)  (1.  1) 

Answer: 


(a) 


16.  Let  T'.R 4 > R*  be  the  linear  transformation  given  by  the  formula 

T(x i,*2.  *3>*4)  = (4xi  + *2  “ 2*3  - 3*4, 

2xi  +X2  +X3  —4x4,  6xi  — 9x3  + 9x4) 

Which  of  the  following  are  in  R(t)7 

(a)  (0.  0,  6) 

(b)  (1.  3,  0) 

(c)  (2.4,1) 

17.  Let  T:R7  — ► F?  be  the  linear  transformation  in  Exercise  16.  Which  of  the  following  are  in  lcer(t)  ? 

(a)  (3,  -8,2,0) 

(b)  (0,0,  0,1) 

(c)  (0,  -4,1,0) 

Answer: 

(a) 

1 8.  Let  T.  P2  — • * P2  be  the  linear  transformation  defined  by7’(/>(x))  = xp(x).  Which  of  the  following  are  in 
ker(0? 

(a)  x2 

(b)  0 

(c)  1 I x 

19.  Let  T.  P2  ► P2  be  the  linear  transformation  in  Exercise  18.  Which  of  the  following  are  in  R(t)7 

(a)  x + x2 

(b)  1+* 

(c)  3 — x2 

Answer: 

(a) 

20.  Find  a basis  for  the  kernel  of 

(a)  the  linear  operator  in  Exercise  14. 

(b)  the  linear  transformation  in  Exercise  16. 

(c)  the  linear  transformation  in  Exercise  18. 

21.  Find  a basis  for  the  range  of 

(a)  the  linear  operator  in  Exercise  14. 

(b)  the  linear  transformation  in  Exercise  16. 

(c)  the  linear  transformation  in  Exercise  18. 

Answer: 


(a)  (1,  -4) 

(b)  (4,2,6),  (1,1,0),  (-3,  -4,9) 


(c)  X,  X2,  X3 

22.  Verify  Formula  7 of  the  dimension  theorem  for 

(a)  the  linear  operator  in  Exercise  14. 

(b)  the  linear  transformation  in  Exercise  16. 

(c)  the  linear  transformation  in  Exercise  18. 

In  Exercises  23-26,  let  T be  multiplication  by  the  matrix  ,4.  Find 

(a)  a basis  for  the  range  of  T. 

(b)  a basis  for  the  kernel  of  T. 

(c)  the  rank  and  nullity  of  T. 

(d)  the  rank  and  nullity  of  A. 


23.  1-1  3 

A=  5 6-4 

7 4 2 

Answer: 


(a) 

(b) 


'-1" 

7 

6 

4 

-14 

19 

11 


(c)  Rank(7’)  = 2,  nullity(7’)  = 1 

(d)  Rank(j4)  = 2,  nullity(j4)  = 1 


24. 


A = 


25. 


2 0 -1 
4 0-2 
20  0 0 

2 
0 


L 

'4 

1 

5 

A = 

_1 

2 

3 

Answer: 

(a) 

T 

'O' 

0 

7 

1 

(b) 


-1 

-1 

1 

0 


-4 

2 

0 

7 


(c)  Rank  (T)  = nuUity(T’)  = 2 


(d)  Rank  (A)  = nullity  (A)  = 2 


1 4 5 0 9" 

3 —2  10-1 

1 0-10-1 

2 3 5 1 8 

27.  Describe  the  kernel  and  range  of 

(a)  the  orthogonal  projection  on  the  x^-plane. 

(b)  the  orthogonal  projection  on  the  yz-plane. 

(c)  the  orthogonal  projection  on  the  plane  defined  by  the  equation  y = x. 

Answer: 

(a)  Kernel:  y-axis;  range:  xz-plane 

(b)  Kernel:  v-axis;  range:  yz-plane 

(c)  Kernel:  the  line  through  the  origin  perpendicular  to  the  plane  y = x;  range:  plane  y = x 

28.  Let  Vb e any  vector  space,  and  let  7’;  V — ► V be  defined  by  T(v)  = 3v. 

(a)  What  is  the  kernel  of  77 

(b)  What  is  the  range  of  77 

29.  In  each  part,  use  the  given  information  to  find  the  nullity  of  the  linear  transformation  T. 

(a)  T — ► R y has  rank  3. 

(b)  T:  P4  — > P3  has  rank  1 . 

(c)  The  range  of  X.R * > R*  is  R*. 

(d)  T : M22  —+  M 22  has  rank  3. 

Answer: 

(a)  Nullity  (T)  = 2 

(b)  Nullity (T)  =4 

(c)  Nullity  (T)  = 3 

(d)  Nullity (T)  = 1 

30.  Let  A be  a 7 x 6 matrix  such  that  Ax  = 0 has  only  the  trivial  solution,  and  let  T.R^  > R y be  multiplication  by 
A.  Find  the  rank  and  nullity  of  T. 

31.  Let  A be  a 5 x 7 matrix  with  rank  4. 

(a)  What  is  the  dimension  of  the  solution  space  of  Ax  = 0? 

(b)  Is,4x  = b consistent  for  all  vectors  b in  R-fC!  Explain. 

Answer: 

(a)  3 

(b)  No 

32.  Let  T.R?  — ► W be  a linear  transformation  from  R-'  to  any  vector  space.  Give  a geometric  description  of  ker(^)* 


33.  Let  T V * R~'  be  a linear  transformation  from  any  vector  space  to  R-'.  Give  a geometric  description  of  R(t) . 

Answer: 

A line  through  the  origin,  a plane  through  the  origin,  the  origin  only,  or  all  of  R- 

34.  Let  T.R?  — ► R?  be  multiplication  by 

"13  4" 

3 4 7 
“2  2 0 

(a)  Show  that  the  kernel  of  T is  a line  through  the  origin,  and  find  parametric  equations  for  it. 

(b)  Show  that  the  range  of  T is  a plane  through  the  origin,  and  find  an  equation  for  it. 

35-  (a)  Show  that  if  a 1 , a 2,  b \ , and  &2  are  anY  scalars,  then  the  formula 

F(x,y)  = (a\x  +b\y,a2X  +bzy) 

defines  a linear  operator  on  R A 

(b)  Does  the  formula  F y J = \ct\x*  + b\y^,  a 2* 2 + b^yA  j define  a linear  operator  on  r}1  Explain. 

Answer: 

(b)  No 

36.  Let  { vi , V2, . . vM } be  a basis  for  a vector  space  V,  and  let  T:  V » W be  a linear  transformation.  Show  that  if 

7(vi)  = r(v2)=  ■ ■ ■ =T(vm)=0 

then  T is  the  zero  transformation. 

37.  Let  (vi,  V2, v„}  be  a basis  for  a vector  space  V,  and  let  T:V  » V be  a linear  operator.  Show  that  if 

7(vi)=vi,  T(v2)=v2 T(v„)=v„ 

then  T is  the  identity  transformation  on  V. 

38.  For  a positive  integer  ^ > ] , let  T.  Mnn  » R be  the  linear  transformation  defined  by  T (-4)  = tr(^4) , where  A is 
an  n x n matrix  with  real  entries.  Determine  the  dimension  of  ker(^) . 

39.  Prove:  If  { v\ , V2, . . v„  } is  a basis  for  V and  w\ , W2, . . are  vectors  in  W,  not  necessarily  distinct,  then 
there  exists  a linear  transformation  T:  V — ► W such  that 

7(vi)  =wi,  7(v2)  = w2,  ....  T(y„)  = w„ 

40.  ( Calculus  required)  Let  V = C[a,  b]  be  the  vector  space  of  functions  continuous  on  [a7  b] , and  let  T \V  - ?V 
be  the  transformation  defined  by 

7(f)  = 5/  (x)  + 3 f*  f (t)dt 

J a 

Is  T a linear  operator? 

41.  ( Calculus  required)  Let  D.P^  ► P2  be  the  differentiation  transformation  D (p  ) = p* (*) . What  is  the  kernel  of 
D1 

Answer: 


ker(D)  consists  of  all  constant  polynomials. 


42. 


( Calculus  required)  Let  J P\ 


of  J? 


£ be  the  integration  transformation  J(p) 


p(x)dx • What  is  the  kernel 


43.  ( Calculus  required)  Let  Lbe  the  vector  space  of  real-valued  functions  with  continuous  derivatives  of  all  orders 
on  the  interval  ( — oo,  00) , and  let  W = F ( — 00,  00)  be  the  vector  space  of  real-valued  functions  defined  on 
(-00,00). 


(a)  Find  a linear  transformation  TY->W  whose  kernel  is  P3. 

(b)  Find  a linear  transformation  TY->W  whose  kernel  is  Pn. 


Answer: 

(a)  T(f(x))=/^(x) 

(b)  T(f(x))=f(-”+l\x) 

44.  If  A is  an  m x n matrix,  and  if  the  linear  system  = b is  consistent  for  every  vector  b in  Rm,  what  can  you 

say  about  the  range  of  T R”  — » Rmc! 

True-False  Exercises 


In  parts  (a)-(i)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  If  T(c\v\  4-  ^2V2)  = ciT(vi)  I C2T(v2)  for  all  vectors  v\  and  V2  in  V and  all  scalars  Cl  and  C2,  then  Tis  a 
linear  transformation. 

Answer: 

True 

(b)  If  v is  a nonzero  vector  in  V,  then  there  is  exactly  one  linear  transformation  T’V  —*W  su°h  that 
7(-v)=  -T(v). 

Answer: 

False 

(c)  There  is  exactly  one  linear  transformation  — ► W f°r  which  f(u  + v)  = T(u  — v)  for  all  vectors  u and  v in 

V. 

Answer: 

True 

(d)  If  vq  is  a nonzero  vector  in  V,  then  the  formula  T(v)  = vq  + v defines  a linear  operator  on  V. 

Answer: 

False 

(e)  The  kernel  of  a linear  transformation  is  a vector  space. 

Answer: 


True 


(f)  The  range  of  a linear  transformation  is  a vector  space. 

Answer: 

True 

(g)  If  T.  Pfi  — ► M 22  is  a linear  transformation,  then  the  nullity  of  T is  3. 

Answer: 

False 

(h)  The  function  T : M22  R defined  by  T(A)  = det  A is  a linear  transformation. 

Answer: 

False 

(i)  The  linear  transformation  T : M22  — 5 ► M22  defined  by 

T(A) 


has  rank  1 . 
Answer: 

False 
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8.2  Isomorphism 

In  this  section  we  will  establish  a fundamental  connection  between  real  finite-dimensional  vector  spaces  and  the  Euclidean 
space  R*1.  This  connection  is  not  only  important  theoretically,  but  it  has  practical  applications  in  that  it  allows  us  to  perform 
vector  computations  in  general  vector  spaces  by  working  with  the  vectors  in  Rn. 


One-to-One  and  Onto 

Although  many  of  the  theorems  in  this  text  have  been  concerned  exclusively  with  the  vector  space  Rn,  this  is  not  as  limiting 
as  it  might  seem.  As  we  will  show,  the  vector  space  Rn  is  the  “mother”  of  all  real  ^-dimensional  vector  spaces  in  the  sense 
that  any  such  space  might  differ  from  Rn  in  the  notation  used  to  represent  vectors,  but  not  in  its  algebraic  structure.  To 
explain  what  we  mean  by  this,  we  will  need  two  definitions,  the  first  of  which  is  a generalization  of  Definition  1 in  Section 
4.10.  (See  Figure  8.2.1). 

r n 


DEFINITION  1 

If  X:  V — ► W is  a linear  transformation  from  a vector  space  V to  a vector  space  W,  then  T is  said  to  be  one-to-one  if 
T maps  distinct  vectors  in  V into  distinct  vectors  in  W. 


J 

n 


DEFINITION  2 

If  7*  y _►  is  a linear  transformation  from  a vector  space  V to  a vector  space  W,  then  T is  said  to  be  onto  (or  onto 
W)  if  every  vector  in  W is  the  image  of  at  least  one  vector  in  V. 
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Range 
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One-to-one.  Distinct 
vectors  in  V have 
distinct  images  in  W. 


Not  one-to-one.  There 
exist  distinct  vectors  in 
Vwith  the  same  image. 


Onto  IV.  Every  vector  in 
W is  the  image  of  some 
vector  in  V. 


Not  onto  W.  Not  every 
vector  in  IV  is  the  image 
of  some  vector  i n V. 


Figure  8.2.1 


The  following  theorem  provides  a useful  way  of  telling  whether  a linear  transformation  is  one-to-one  by  examining  its 
kernel. 


THEOREM  8.2.1 


If  7*  y pF  is  a linear  transformation,  then  the  following  statements  are  equivalent. 


(a)  T is  one-to-one. 

(b)  ker(0  = {0} 


Proof  (a)  =>  (b)  Since  T is  linear,  we  know  that  7X0)  = 0 by  Theorem  8.1.1a.  Since  T is  one-to-one,  there  can  be  no 
other  vectors  in  V that  map  into  0,  so  ker(T)  = {0}  . 

Assume  that  ker(^)  = (0)  . If  u and  v are  distinct  vectors  in  V,  then  u — v ^ 0-  This  implies  that  T(n  — v)  * 0, 
for  otherwise  ker(T)  would  contain  a nonzero  vector.  Since  T is  linear,  it  follows  that 

7Xu)-7Xv)  = 7Xu-v)*0 

so  T maps  distinct  vectors  in  V into  distinct  vectors  in  W and  hence  is  one-to-one. 


In  the  special  case  where  V is  finite-dimensional  and  T is  a linear  operator  on  V,  then  we  can  add  a third  statement  to  those 
in  Theorem  8.2.1. 


THEOREM  8.2.2 

If  V is  a finite-dimensional  vector  space,  and  if  T7;  y y is  a linear  operator,  then  the  following  statements  are 
equivalent. 

(a)  T is  one-to-one. 

(b)  ker(0  = {0}  . 

(c)  T is  onto  [i.e.,  £(2)  = V] 


We  already  know  that  (a)  and  (b)  are  equivalent  by  Theorem  8.2.1,  so  it  suffices  to  show  that  (b)  and  (c)  are 
equivalent.  We  leave  it  for  you  to  do  this  by  assuming  that  dimf^)  = n and  applying  Theorem  8.1.4. 


EXAMPLE  1 Dilations  and  Contractions  Are  One-to-One  and  Onto 

Show  that  if  V is  a finite-dimensional  vector  space  and  c is  any  nonzero  scalar,  then  the  linear  operator 
T V -+V  defined  by  77(v)  = cv  is  one-to-one  and  onto. 

The  operator  T is  onto  (and  hence  one-to-one)  for  if  v is  any  vector  in  V then  that  vector  is  the 
image  of  the  vector  (1  / c)v. 


EXAMPLE  2 Matrix  Operators 

If  Tjj;.Rn  — » Rn  is  the  matrix  operator  Tji(x)  = ylx,  then  it  follows  from  parts  (r)  and  (s)  of  Theorem  5.1.6  that 
T j\  is  one-to-one  and  onto  if  and  only  if  A is  invertible. 


EXAMPLE  3 Shifting  Operators 


Let  V = R ^ be  the  sequence  space  discussed  in  Example  3 of  Section  4. 1 , and  consider  the  linear  “shifting 
operators”  on  V defined  by 

^l(«l.«2 «».— ) = 

^2(“1.“2 «m.— ) = (M2>  u3 «»,— ) 

Show  that  T i is  one-to-one  but  not  onto. 

(b)  Show  that  7*2  is  onto  but  not  one  to  one. 

Solution 

The  operator  T \ is  one-to-one  because  distinct  sequences  in  R x obviously  have  distinct  images.  This 
operator  is  not  onto  because  no  vector  in  £°°  maps  into  the  sequence  (1,  0,  0, 0, ...),  for  example. 

The  operator  is  not  one-to-one  because,  for  example,  the  vectors  (1,  0,  0, ...,  0, ...)  and 
(2,  0,  0, 0, ...)  both  map  into  (0,  0,  0, 0, ...).  This  operator  is  onto  because  every  possible 
sequence  of  real  numbers  can  be  obtained  with  an  appropriate  choice  of  the  numbers  w 2, 2*3, 


Why  does  Example  3 not  violate  Theorem  8.2.2? 


EXAMPLE  4 Basic  Transformations  That  Are  One-to-One  and  Onto 


The  linear  transformations  TyP^—^R^  and  Ti  Mji  — ► i?4  defined  by 

T\(a  + bx  + cx2  + = (a,  b,c,d J 


T2 


a b 
c d 


= \a,b,c,d\ 


are  both  one-to-one  and  onto  (verify  by  showing  that  their  kernels  contain  only  the  zero  vector). 


EXAMPLE  5 A One-to-One  Linear  Transformation 

Let  T.  Pn  — ► Pn+\  be  the  linear  transformation 

T(p  ) = T(p(x))  = xp(x ) 

discussed  in  Example  5 of  Section  8.1.  If 

p = p(x)  =cq  4- ci*  4s  " • • + and  q = q(x)  = d$  + d\x  + • • • -\-dypP1 

are  distinct  polynomials,  then  they  differ  in  at  least  one  coefficient.  Thus, 

7^p  J =cqx  ^c\x2  + • • • + and  T|qJ  =d$x  -\-d\x2  + • • • ^dnxn+X 

also  differ  in  at  least  one  coefficient.  It  follows  that  T is  one-to-one  since  it  maps  distinct  polynomials  p and  q 
into  distinct  polynomials  T(p)  and  T(q). 

CALCULUS  REQUIRED 

EXAMPLE  6 A Transformation  That  Is  Not  One-to-One 


Let 


D:  C1  ^ — oo,  ooj  — ►7’^  — oo,  oo 


be  the  differentiation  transformation  discussed  in  Example  11  of  Section  8.1.  This  linear  transformation  is  not 
one-to-one  because  it  maps  functions  that  differ  by  a constant  into  the  same  function.  For  example, 


Z)(x2)  = i)(x2-M)  = 2x 


Dimension  and  Linear  Transformations 

In  the  exercises  we  will  ask  you  to  prove  the  following  two  important  facts  about  a linear  transformation  T y — ► W in  the 
case  where  V and  W are  finite-dimensional: 

If  dim(FF)  < <3om(V),  then  T cannot  be  one-to-one. 

If  dim^)  < dim(PT),  then  T cannot  be  onto. 

Stated  informally,  if  a linear  transformation  maps  a “bigger”  space  to  a “smaller”  space,  then  some  points  in  the  “bigger” 
space  must  have  the  same  image;  and  if  a linear  transformation  maps  a “smaller”  space  to  a “bigger”  space,  then  there  must 
be  points  in  the  “bigger”  space  that  are  not  images  of  any  points  in  the  “smaller”  space. 

These  observations  tell  us,  for  example,  that  any  linear  transformation  from  p}  to  must  map  some  distinct 
points  of  p}  into  the  same  point  in  £2,  and  it  also  tells  us  that  there  is  no  linear  transformation  that  maps  p2  onto  all  of  p^. 


Isomorphism 

Our  next  definition  paves  the  way  for  the  main  result  in  this  section. 


DEFINITION  3 

If  a linear  transformation  T\  y _>  is  both  one-to-one  and  onto,  then  T is  said  to  be  an  isomorphism , and  the 
vector  spaces  V and  W are  said  to  be  isomorphic. 


J 


The  word  isomorphic  is  derived  from  the  Greek  words  iso , meaning  “identical,”  and  morphe,  meaning  “form.”  This 
terminology  is  appropriate  because,  as  we  will  now  explain,  isomorphic  vector  spaces  have  the  same  “algebraic  form,”  even 
though  they  may  consist  of  different  kinds  of  objects.  To  illustrate  this  idea,  examine  Table  1 in  which  we  have  shown  how 
the  isomorphism 

ao  (aQ,a\,a2^ 

matches  up  vector  operations  in  P2  and  p-'. 


Table  1 


Operation  in  Pi 

Operation  in  R3 

3(l  — 2x  + 3x2)  = 3 — 6jr  + 9x2 

3(1,  -2,3)  = (3,  -6,9) 

^2  + x-x2J+  (l  - x + 5x2J  = 3 + 4x2 

(2,  1,  -1)  + (1,  -1,5)  = (3,0, 4) 

Operation  in  Pi 

Operation  in  R3 

i 

(4  + 2x  + 3x2] 

1-' 

[2-4x  + 3;r2] 

| = 2 + 6x 

(4,  2,  3) -(2,  -4,  3)  = (2,6,0) 

The  following  theorem,  which  is  one  of  the  most  important  results  in  linear  algebra,  reveals  the  fundamental  importance  of 
the  vector  space 

THEOREM  8.2.3 

Every  real  ^-dimensional  vector  space  is  isomorphic  to  R*1. 


Theorem  8.2.3  tells  us  that  a real  ^-dimensional  vector 
space  may  differ  from  R™  in  notation,  but  its  algebraic 
structure  will  be  the  same. 


Let  Ebe  a real  ^-dimensional  vector  space.  To  prove  that  E is  isomorphic  to  R”  we  must  find  a linear 
transformation  T:  V — ► Rn  that  is  one-to-one  and  onto.  For  this  purpose,  let 


be  any  basis  for  E,  let 


vi,  v2,.~,  v„ 


u = jfcivi  +£2V2+  • • ■ -\-knYn  (1) 

be  the  representation  of  a vector  u in  E as  a linear  combination  of  the  basis  vectors,  and  define  the  transformation 
T.V^R”  by 


T(u)  = (khk2,..,kn)  (2) 

We  will  show  that  T is  an  isomorphism  (linear,  one-to-one,  and  onto).  To  prove  the  linearity,  let  u and  v be  vectors  in  E,  let 
c be  a scalar,  and  let 


\i  = k\vi  +^2v2+  ’ ’ “ d ~kn\n  and  v = d\vi  4-  ^2V2  + * * ' 4-  dnvn 


(3) 


be  the  representations  of  u and  v as  linear  combinations  of  the  basis  vectors.  Then  it  follows  from  1 that 

7(cu)  = T(ck\v\  +c^2V2  + ' ■ * 

= (ck\,ck2,...,ckn) 

= c(k\,  k2, kn)  =cT( u) 


and  it  follows  from  2 that 

7(u  + v) 


T((ki  +<afi)vi  + (k2+d2)v2+  • • • + (k„  + dn)vn) 
(£l  +di,k2  + d2,  —.kn+dn) 

(k\,k2 k„)  + (di,d2 d„) 

T(n)  4-  T(y) 


which  shows  that  T is  linear.  To  show  that  T is  one-to-one,  we  must  show  that  if  u and  v are  distinct  vectors  in  E,  then  so  are 
their  images  in  But  if  u ^ v>  and  if  the  representations  of  these  vectors  in  terms  of  the  basis  vectors  are  as  in  3,  then  we 


must  have  * d 2 for  at  least  one  i.  Thus, 

T(n)  = (kh  k2 k»)*(dhd2 d»)  = 7(v) 

which  shows  that  u and  v have  distinct  images  under  T.  Finally,  the  transformation  T is  onto,  for  if 

w=  {k\,k2,  kn) 

is  any  vector  in  £ ”,  then  it  follows  from  2 that  w is  the  image  under  T of  the  vector 

u = *lvl  4“^2v2  + ' ‘ 

Note  that  the  isomorphism  T in  Formula  2 of  the  foregoing  proof  is  the  coordinate  map 

uZ  (k\,k2 kn ) = (u)5 

that  maps  u into  its  coordinate  vector  with  respect  to  the  basis  S = {vj,  V2, vw}  . Since  there  are  generally  many 
possible  bases  for  a given  vector  space  V,  there  are  generally  many  possible  isomorphisms  between  V and  R ”,  one  for  each 
different  basis. 


EXAMPLE  7 The  Natural  Isomorphism  from  Pn-  i to  Rn 

We  leave  it  for  you  to  verify  that  the  mapping 

X ^o,  a\, an-\  J 

from  Pn-\  to  R ” is  one-to-one,  onto,  and  linear.  This  is  called  the  natural  isomorphism  from  Pn-\  to  Rn 
because,  as  the  following  computations  show,  it  maps  the  natural  basis  |l,  1 j for  Pn-\  into  the 

standard  basis  for  Rn: 

1 = 1 + Ox  + Ox2  + • • • + Ox”-1  X (1,  0,  0, ....  0) 
x = 0 + x + 0x2  + • • • + 0x"_1  X (0,1,0 0) 

x”"1  = 0 * Ox  * Ox2  + • • • + x"_1  X (0,0,0,...,!) 


EXAMPLE  8 


The  Natural  Isomorphism  from  M22  to  R4 


◄ 


The  matrices 


1 1 

0 — 1 

0 0 

1 1 

ba 

to 

II 

"0  f 

.0  0. 

. 23= 

0 0 1 

0 ( 

II 

1 1 

0 1— 

0 0 

B 1 = 

u u j L'JUJ  LAUJ  lu 

form  a basis  for  the  vector  space  M 22  °f  2 x 2 matrices.  An  isomorphism  j M22  — * ► R^  can  be  constructed  by 
first  writing  a matrix  A in  M 22  An  terms  of  the  basis  vectors  as 


and  then  defining  T as 
Thus,  for  example, 


r«i 

"i  c 

"0  r 

"0 

o' 

"0  O' 

[a3 

Ct/\ 

= a\ 

.0  c 

,J+-2 

.0  °. 

+ <23 

1 

0_ 

+ <z  4 

.0  1_ 

7(A)  = (ai,a2,a3,a4) 


1 -3 
4 6 


1,  -3,4,6 


More  generally,  this  idea  can  be  used  to  show  that  the  vector  space  Mmn  of  m x n matrices  with  real  entries  is 
isomorphic  to  Rmyi. 


EXAMPLE  9 Differentiation  by  Matrix  Multiplication 


Consider  the  differentiation  transformation  D.P3  — » P2  °n  the  vector  space  of  polynomials  of  degree  three  or 
less.  If  we  map  P3  and  P2  into  and  r},  respectively,  by  the  natural  isomorphisms,  then  the  transformation  D 
produces  a corresponding  matrix  transformation  from  to  g?.  Specifically,  the  derivative  transformation 


2 3 0 2 

+ +<23*  ^ a\  2ct2X  + l>ayi 

produces  the  matrix  transformation 

^O" 


0 10  0 
0 0 2 0 
0 0 0 3 


a\ 

a 2 
a2 


Thus,  for  example,  the  derivative 
can  be  calculated  as  the  matrix  product 


2 + x+4x' 


:-'3)= 


2a2 
3a  3 


1 + 8x  - 3xJ 


"0100" 
0 0 2 0 

2' 

1 

A 

f 

8 

0 0 0 3 

-1 

-3 

This  idea  is  useful  for  constructing  numerical  algorithms  to  perform  derivative  calculations. 


Inner  Product  Space  Isomorphisms 

In  the  case  where  V is  a real  //-dimensional  inner  product  space,  both  V and  Rn  have,  in  addition  to  their  algebraic  structure, 
a geometric  structure  arising  from  their  respective  inner  products.  Thus,  it  is  reasonable  to  inquire  if  there  exists  an 
isomorphism  from  V to  Rn  that  preserves  the  geometric  structure  as  well  as  the  algebraic  structure.  For  example,  we  would 
want  orthogonal  vectors  in  V to  have  orthogonal  counterparts  in  /?”,  and  we  would  want  orthonormal  sets  in  V to 
correspond  to  orthonormal  sets  in  /?”. 

In  order  for  an  isomorphism  to  preserve  geometric  structure,  it  obviously  has  to  preserve  inner  products,  since  notions  of 
length,  angle,  and  orthogonality  are  all  based  on  the  inner  product.  Thus,  if  V and  W are  inner  product  spaces,  then  we  call 
an  isomorphism  J*  y _ » W an  inner  product  space  isomorphism  if 

(7(u),T(v)}  = (u,v} 

It  can  be  proved  that  if  V is  any  real  ^/-dimensional  inner  product  space  and  Rn  has  the  Euclidean  inner  product  (the  dot 
product),  then  there  exists  an  inner  product  space  isomorphism  from  V to  Rn.  Under  such  an  isomorphism,  the  inner 
product  space  Ehas  the  same  algebraic  and  geometric  structure  as  Rn . In  this  sense,  every  //-dimensional  inner  product 
space  is  a “carbon  copy”  of/?”  with  the  Euclidean  inner  product  that  differs  only  in  the  notation  used  to  represent  vectors. 


EXAMPLE  10  An  Inner  Product  Space  Isomorphism 


Let  Rn  be  the  vector  space  of  real  ^-tuples  in  comma-delimited  form,  let  Mn  be  the  vector  space  of  real  n x 1 
matrices,  let  Rn  have  the  Euclidean  inner  product  (u,  vj  = u • v,  and  let  Mn  have  the  inner  product 

ju,  vj  = urv  in  which  u and  v are  expressed  in  column  form.  The  mapping  T :R”  — » Mn  defined  by 

vf 

v2 

is  an  inner  product  space  isomorphism,  so  the  distinction  between  the  inner  product  space  Rn  and  the  inner 
product  space  M n is  essentially  notational,  a fact  that  we  have  used  many  times  in  this  text. 


(vi.  v2,.~,  VM)  X 


Concept  Review 

• One-to-one 
Onto 

Isomorphism 
Isomorphic  vector  spaces 
Natural  isomorphism 
Inner  product  space  isomorphism 

Skills 

Determine  whether  a linear  transformation  is  one-to-one. 
Determine  whether  a linear  transformation  is  onto. 

Determine  whether  a linear  transformation  is  an  isomorphism. 


Exercise  Set  8.2 

1.  In  each  part,  find  ker(/),  and  determine  whether  the  linear  transformation  T is  one-to-one. 

(a)  T R2  -» R2,  where  T(x,  y)  = (y,  x ) 

(b)  T:R2  — R2,  where  T(x,  y)  = (0,  2x  + 3 y) 

(c)  T:R2->R2,  where  T{x,y)  = (x+y,x-y) 

(d)  T:  R2  -» R3,  where  T(x,  y ) = (x,  y,  x + y) 

(e)  T\R 2 — R?,  where  T{x,  y)  = (x  -y,  y - x,  2x  - 2 y) 

(f)  T: R2-*R2,  where  T(x,y,z)  = (x  + y + z,  x -y -z) 

Answer: 

(a)  ker(T)  = {0};  T is  one-to-one 

(k)  ker(T)  = — lH;  T is  not  one-to-one 


(c)  ker(T)  = {0};  T is  one-to-one 


(d)  ker(T)  = {0} ; T is  one-to-one 

(e)  ker(T)  = {£(1,  1)}  ; T is  not  one-to-one 

(f)  ker(7^)  = {£(0,  1,  — 1)}  ; T is  not  one-to-one 


2.  Which  of  the  transformations  in  Exercise  1 are  onto? 

3.  In  each  part,  determine  whether  multiplication  by  A is  a one-to-one  linear  transformation. 


(a) 


A = 


(b) 


A = 


(c) 


A = 


Answer: 


1 -2 
2 -4 


-3 

1 
2 

-1 

4 -2 
1 5 

5 3 


6 

3 

-1 

3 


(a)  Not  one-to-one 

(b)  Not  one-to-one 

(c)  One-to-one 

4.  Which  of  the  transformations  in  Exercise  3 are  onto? 

5.  As  indicated  in  the  accompanying  figure,  let  TP?  —*  P?  be  the  orthogonal  projection  on  the  line  y —x. 

(a)  Find  the  kernel  of  T. 

(b)  Is  T one-to-one?  Justify  your  conclusion. 

y y = x 

x 


\ 


n*> 


Figure  Ex-5 


Answer: 

(a)  ker(T)  = {fc(  — 1,1)} 

(b)  T is  not  one-to-one  since  ker(T)  * {0}  . 

6.  As  indicated  in  the  accompanying  figure,  let  T ■ p}  g}  be  the  linear  operator  that  reflects  each  point  about  they- 

(a)  Find  the  kernel  of  T. 

(b)  Is  T one-to-one?  Justify  your  conclusion. 


■axis. 


7U) 


Figure  Ex-6 

7.  In  each  part,  use  the  given  information  to  determine  whether  the  linear  transformation  T is  one-to-one. 

(a)  T.Rm-+Rm ; nullity (0  = 0 

(b)  T\Rn->Rn\  rarik(0=»-l 

(c)  T:Rm->Rn:  n<m 

(d)  T:Rn—*Rn-,  R(t)=R ” 

Answer: 

(a)  T is  one-to-one 

(b)  T is  not  one-to-one 

(c)  T is  not  one-to-one 

(d)  T is  one-to-one 

8.  In  each  part,  determine  whether  the  linear  transformation  T is  one-to-one. 

(a)  T:P2  —*^3,  where  Tfa$  + a 1*  + = + a\*  + tf2*2) 

(b)  T:P2->P2,  where  7(/>(x))  = p(x  + 1) 

9.  Prove:  If  V and  W are  finite-dimensional  vector  spaces  such  that  dim(PF)  < dirnfF’),  then  there  is  no  one-to-one  linear 
transformation  g-  V — ► W- 

10.  Prove:  There  can  be  an  onto  linear  transformation  from  V to  W only  if  dim^)  > dim(PF) . 

(a)  Find  an  isomorphism  between  the  vector  space  of  all  3 x 3 symmetric  matrices  and  g^. 

(b)  Find  two  different  isomorphisms  between  the  vector  space  of  all  2 x 2 matrices  and  g^. 

(c)  Find  an  isomorphism  between  the  vector  space  of  all  polynomials  of  degree  at  most  3 such  that  £>(0)  = 0 and  g}. 

(d)  Find  an  isomorphism  between  the  vector  spaces  span  { 1,  sin (x),  cos(x)  } and  g-'. 

Answer: 

(a) 

T 

(b) 

T 

(c)  „ 


\L 


b c 
d e 

e f 


0 ax 3 4=  bx1  =h  cx)  = 


a 

c 

a~ 

b 


a 

c 

b 

d 


T{a  + & sin(x)  +c  cos(x))  = 


(d) 


a 

b 

c 


12. 


i =/: 


p{x)dx.  Determine  whether  J is 


( Calculus  required)  Let  J P\  .*  R be  the  integration  transformation  J 
one-to-one.  Justify  your  conclusion. 

( Calculus  required)  Let  Lbe  the  vector  space  C ^0,  1 J and  let  T\V  —» defined  by 

7(f)  = /(0)  + 2/'(0)  + 3/'(l) 

Verify  that  Lisa  linear  transformation.  Determine  whether  T is  one-to-one,  and  justify  your  conclusion. 

Answer: 

9 9 

T is  not  one-to-one  since,  for  example,  / (x)  = x (x  — 1)  is  in  its  kernel. 

14.  ( Calculus  required)  Devise  a method  for  using  matrix  multiplication  to  differentiate  functions  in  the  vector  space 
span  (1,  sin(x),  cos(x),  sin(2x),  cos(2x)  } . Use  your  method  to  find  the  derivative  of 
3—4  sin(x)  + sin(2x)  + 5 cos(2x). 

Does  the  formula  T\a,  b,  c^=ax~  I bx  I-  c define  a one-to-one  linear  transformation  from  p to  P2I  Explain  your 
reasoning. 

Answer: 


Yes;  it  is  one-to-one 

16.  Let  E be  a fixed  2x2  elementary  matrix.  Does  the  formula  T(A)  = EA  define  a one-to-one  linear  operator  on  Mjfi 
Explain  your  reasoning. 

17.  Let  a be  a fixed  vector  in  p} . Does  the  formula  T{y)  = a x v define  a one-to-one  linear  operator  on  £3?  Explain  your 
reasoning. 

Answer: 


T is  not  one-to-one  since,  for  example  a is  in  its  kernel. 

18.  Prove  that  an  inner  product  space  isomorphism  preserves  angles  and  distances — that  is,  the  angle  between  u and  v in  V 

is  equal  to  the  angle  between  T(u)  and  7Xv)  in  W,  and  ||u  — v||  = ||T(u)  —T(v)  ||^. 

19.  Does  an  inner  product  space  isomorphism  map  orthonormal  sets  to  orthonormal  sets?  Justify  your  answer. 

Answer: 


Yes 

20.  Find  an  inner  product  space  isomorphism  between  P$  and  23. 

True-False  Exercises 


In  parts  (a)-(f)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  The  vector  spaces  p 2 and  P2  are  isomorphic. 

Answer: 

False 

(b)  If  the  kernel  of  a linear  transformation  T:  P^  — ► P%  is  {0}  , then  T is  an  isomorphism. 


Answer: 


True 

(c)  Every  linear  transformation  from  A/33  to  P9  is  an  isomorphism. 

Answer: 

False 

(d)  There  is  a subspace  of  M22  that  is  isomorphic  to 
Answer: 

True 

(e)  There  is  a 2 x 2 matrix  P such  that  T : M 22  — ► Mji  defined  by  T(A)  = AP  — PA  is  an  isomorphism. 
Answer: 

False 

(f)  There  is  a linear  transformation  T:  P4  — ► P4  such  that  the  kernel  of  T is  isomorphic  to  the  range  of  T. 
Answer: 

False 
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8.3  Compositions  and  Inverse  Transformations 

In  Section  4.10  we  discussed  compositions  and  inverses  of  matrix  transformations.  In  this  section  we  will 
extend  some  of  those  ideas  to  general  linear  transformations. 


Composition  of  Linear  Transformations 

The  following  definition  extends  Formula  1 of  Section  4.10  to  general  linear  transformations. 

Note  that  the  word  “with”  establishes  the  order 
of  the  operations  in  a composition.  The 
composition  of  7*2  with  T \ is 

(r2o71)(u)  = T2(Ti(u)) 
whereas  the  composition  of  T \ with  7*2  is 
(Tio72)(u)  = 71(72(u)) 


i 


DEFINITION  1 

If  T\ : U — ► V and  TjV  — » W are  linear  transformations,  then  the  composition  of  T 2 with  T \ , 
denoted  by  72  o T\  (which  is  read  “72  circle  T\”),  is  the  function  defined  by  the  formula 

(72o7i)(u)  = 72(71(u))  (1) 


where  u is  a vector  in  U. 


J 


Observe  that  this  definition  requires  that  the  domain  of  7*2  (which  is  V)  contain  the  range  of  T\ . 
This  is  essential  for  the  formula  T2{T\  (u))  to  make  sense  (Figure  8.3.1). 

7 *2  o l \ 


U 


ii 


Tx 


V 


T,(  ii) 


7\ 


W 


T2(T{(n)) 


The  composition  of  T2  with  T \ . 


Our  first  theorem  shows  that  the  composition  of  two  linear  transformations  is  itself  a linear  transformation. 


THEOREM  8.3.1 


If  T\ : U — ► V and  T2.V  —*  W are  linear  transformations,  then  (T2oT\)  U —*W  is  also  a linear 
transformation. 


If  u and  v are  vectors  in  U and  c is  a scalar,  then  it  follows  from  1 and  the  linearity  of  7 \ and  72  that 


(72oTi)(u  + v)  = 72  (7i  (u  + v) ) = T2  (Xi  (u)  + T{  (v) ) 
= 72(7i(u))  + 72(7i(y)) 

= (72o7i)(u)  + (72o7i)(v) 


and 

(72  o 7i)(cu)  = 72(7i(cu))  = 72(c7i(u)) 

= c72(7i(u))=c(72o7i)(u) 

Thus,  72  o 7j  satisfies  the  two  requirements  of  a linear  transformation. 


EXAMPLE  1 Composition  of  Linear  Transformations 

Let  T\.P\  — ► P2  and  T2 : P2  — ► P2  he  the  linear  transformations  given  by  the  formulas 
T\ (p(x))=xp(x)  and  T2(p(x))  = p(2x  + 4) 

Then  the  composition  (7*2  o T\)  :P\  — ► P2  is  given  by  the  formula 

(72  o 7i)(p(x))  = 72(7j  (*(*)))  = T2(xp(x))  = (2*  + 4)p(2x  + 4) 
In  particular,  if  p(x)  =cq  + c\x,  then 

(72o7i)(^(x))  = (72  o 7i)(co  + ci^)  = (2x  + 4)(c0  + ci(2x  + 4)) 

= cq(2x  + 4j  + ci(2x  + 4)^ 


EXAMPLE  2 Composition  with  the  Identity  Operator 

If7:r-  y is  any  linear  operator,  and  if  /;  y _►  y is  the  identity  operator  (Example  3 of  Section 
8.1),  then  for  all  vectors  v in  V,  we  have 

(7o/)(v)  = 7(/(v))  = 7(v) 

(/oT)(v)  = /(7(v))=7(v) 

It  follows  that  To  I and  / 0 7 are  the  same  as  T;  that  is, 


7o/  = 7 and  loT=T 


(2) 


As  illustrated  in  Figure  8.3.2,  compositions  can  be  defined  for  more  than  two  linear  transformations.  For 
example,  if 

T\:U->V,  Ti'V  — » W,  and  T3:W^Y 

are  linear  transformations,  then  the  composition  T3  o T2  o Ti  is  defined  by 

(T3  oT2o  Ti)(u)  = T3(T2(T1(}i)))  (3) 


(rs  o r2  o r,Ku) 


II 

u 


TjGO 

V 


7*2 


T^Tt(  u)) 

W 


T3 

T3(T2(T{(n))) 

Y 


The  composition  of  three  linear  transformations. 


Inverse  Linear  Transformations 

In  Theorem  4.10.1  we  showed  that  a matrix  operator  Tj[.  Rn  » Rn  is  one-to-one  if  and  only  if  the  matrix  A is 
invertible,  in  which  case  the  inverse  operator  is  T ^_i . We  then  showed  that  if  w is  the  image  of  a vector  x 
under  the  operator  T then  x is  the  image  under  T ^_i  of  the  vector  w (see  Figure  4.10.8).  Our  next  objective 
is  to  extend  the  notion  of  invertibility  to  general  linear  transformations. 

Recall  that  if  J7  y _ » is  a linear  transformation,  then  the  range  of  T,  denoted  by  R{t ),  is  the  subspace  of  W 
consisting  of  all  images  under  T of  vectors  in  V.  If  T is  one-to-one,  then  each  vector  w in  R(t)  is  the  image  of 
a unique  vector  v in  V.  This  uniqueness  allows  us  to  define  a new  function,  called  the  inverse  of  T and 
denoted  by  7”1,  that  maps  w back  into  v (Figure  8.3.3). 

T 

v w = 7Xv) 

V T 1 R(T) 

The  inverse  of  T maps  T(v)  back  into  v. 

It  can  be  proved  (Exercise  19)  that  T~ 1 : R{t)  — ► V is  a linear  transformation.  Moreover,  it  follows  from  the 
definition  of  7_1  that 

7"I(7'(y))  = 7"‘(W)  = V (4) 


7(T-‘(w))  = 7(v)=w 


(5) 


so  that  T and  R * , when  applied  in  succession  in  either  order,  cancel  the  effect  of  each  other. 

It  is  important  to  note  that  if  R y _ ► fp  is  a one-to-one  linear  transformation,  then  the  domain  of 
7’-1  is  the  range  of  T,  where  the  range  may  or  may  not  be  all  of  W.  However,  in  the  special  case  where 
T\  V — > V is  a one-to-one  linear  operator  and  V is  ^-dimensional,  then  it  follows  from  Theorem  8.2.2  that  T 
must  also  be  onto,  so  the  domain  of  7’-1  is  all  of  V. 

EXAMPLE  3 An  Inverse  Transformation 

In  Example  5 of  Section  8.2  we  showed  that  the  linear  transformation  T:Pn — » Pn-\-\  given  by 

T(  v)  = T(p(x))=xp(x) 

is  one-to-one;  thus,  T has  an  inverse.  In  this  case  the  range  of  T is  not  all  of  Pn^\  but  rather  the 
subspace  of  Pn-\.\  consisting  of  polynomials  with  a zero  constant  term.  This  is  evident  from  the 
formula  for  T: 

7’^co  + cT*+  ' ' ' +cyixy>''j  = CQX +c\x2  + • • • 

It  follows  that  T~^  :R(t)  —*  Pn  is  given  by  the  formula 

7-1  -¥c\x2  + • • • + c„x”+1  j =co  + <HX  + - - + Cn*” 

For  example,  in  the  case  where  n > 3, 

T~ * (2x  — x^  + 5x^  + = 2 — x + 5x*  + 3x^ 


EXAMPLE  4 An  Inverse  Transformation 

Let  T.B?  — ► R~'  be  the  linear  operator  defined  by  the  formula 

T(x  1,*2,  *3)  = (3*1  + *2>  -2xi  -4X2 + 3x3,  5xi  +4x2 -2x3) 
Determine  whether  T is  one-to-one;  if  so,  find  T~ ^ |x  \ , X2,  X3  j. 

It  follows  from  Formula  12  of  Section  4.9  that  the  standard  matrix  for  T is 


3 

1 

O' 

T 

= 

-2 

-4 

3 

5 

4 

-2 

(verify).  This  matrix  is  invertible,  and  from  Formula  7 of  Section  4.10  the  standard  matrix  for 
R~^  is 


= [7]-‘  = 

4 

-2 

-3' 

T-\ 

-11 

6 

9 

-12 

7 

10 

It  follows  that 


T*r 

\ 

T~\ 

4 

-2 

—3 

■*r 

4xi 

— 

2x2 

— 

3x3 

T-\ 

*2 

= 

^2 

= 

-11 

6 

9 

^2 

= 

—11*1 

+ 

6x2 

+ 

9*3 

^3 

-12 

7 

10 

^3 

—12*i 

+ 

7x2 

+ 

10*3 

Expressing  this  result  in  horizontal  notation  yields 

7-1  X2,  X3J  = - 2^2  — 3*3,  -11x1  + 6*2  + 9*3,  -12x1+7*2  + 


Composition  of  One-To-One  Linear  Transformations 

The  following  theorem  shows  that  a composition  of  one-to-one  linear  transformations  is  one-to-one,  and  it 
relates  the  inverse  of  a composition  to  the  inverses  of  its  individual  linear  transformations. 


THEOREM  8.3.2 

If  Ti : U —*  V and  T2.V  —*W  are  one-to-one  linear  transformations,  then 

(a)  T2oT\  is  one-to-one. 

(b)  (T2oTi)"1  = Tf1  oT2_1. 


We  want  to  show  that  T2  o T\  maps  distinct  vectors  in  U into  distinct  vectors  in  W.  But  if  u and  v 
are  distinct  vectors  in  U,  then  T \ (u)  and  T \ (v)  are  distinct  vectors  in  V since  T 1 is  one-to-one.  This  and  the 
fact  that  T2  is  one-to-one  imply  that 


T2(Ti(u))  and  T2{Tx{w)) 

are  also  distinct  vectors.  But  these  expressions  can  also  be  written  as 

(T2oTi)(u)  and  (T2oTi)(v) 
so  T2  o Ti  maps  u and  v into  distinct  vectors  in  W. 

Proof  (b)  We  want  to  show  that 


(r2  o ri  r1  (w)  = (771  o 77 1 ) (w) 

for  every  vector  w in  the  range  of  T2  o T\ . For  this  purpose,  let 

u=(r2<.ri)-'(w) 


(«) 


so  our  goal  is  to  show  that 


u=(771o72-‘)(w) 


But  it  follows  from  6 that 


(r2o7i)(u)  =w 


or,  equivalently, 


72(7i(u))=w 

Now,  taking  77 1 of  each  side  of  this  equation,  then  taking  of  each  side  of  the  result,  and  then  using  4 

yields  (verify) 

« = 7r‘(72-'(w)) 

or,  equivalently, 

u=  (7P1  o72-1)(w) 


In  words,  part  ( b ) of  Theorem  8.3.2  states  that  the  inverse  of  a composition  is  the  composition  of  the  inverses 
in  the  reverse  order.  This  result  can  be  extended  to  compositions  of  three  or  more  linear  transformations;  for 
example, 

(73  o 72  o 7i ) -1  = 7f 1 o Tp  o Tf1  (7) 

In  the  case  where  T T&  and  T c are  matrix  operators  on  R}\  Formula  7 can  be  written  as 

(7(^0  Tqq  T a)  =Ta  o Tb  oTc 

or  alternatively  as 

(Tcba)-1  = Ta.ib.ic.i  (8) 


Note  the  order  of  the  subscripts  on  the  two 
sides  of  Formula  8. 


Concept  Review 

Composition  of  linear  transformations 
Inverse  of  a linear  transformation 

Skills 

Find  the  domain  and  range  of  the  composition  of  two  linear  transformations. 
Find  the  composition  of  two  linear  transformations. 

Determine  whether  a linear  transformation  has  an  inverse. 

Find  the  inverse  of  a linear  transformation. 


Exercise  Set  8.3 


1.  Find  (T2o  Tfi(x,yy 

(a)  T\  {x,y)  = (2x,  3y),  T2(x,y)  = (x-y,x+y) 

(b)  T\(x,y)  = (x-3 y,  0),T2(x,y)  = {Ax-5y,  3x-6y) 

(c)  T\{x,y)  = {2x,  - 3y,x+y),T2(x,y,z)  = (x-y,y+z ) 

(d)  T\(x,y)  = (x-y,y,  x),T2{x,y,z)  = (0,  x+.y  +z) 

Answer: 


(a)  (T2°TO(x,y)  = (2x-3y,2x  + 3y) 

(b)  (T2  o T{)(x,y)  = {Ax  - \2y,  3x  - 9 y) 

(c)  (T2oTi)(x,y)  = (2x  + 3y,  x-2 y) 

(d)  {T2oTi)(x,y)  = (0,  2x) 


2.Ym&{T3oT2°T{){x,y). 


(a)  T\{x,y)  = (-2 y,  3x,  x - 2y),  T2(x,  y,z)  = (y,z,x),T3{x,y,z)  = ( x+z,y-z ) 

(b)  T\{x,y)  = (x+7,^,  -x),T2(x,y,z)  = (0,  x+y+z,  3y), 


T3(x, y, z)  = (3x  + 2y,4z-x-  3 y) 


3.  Let  T 1 : M2 2 — » R and  T3  . M22  M 22  be  the  linear  transformations  given  by  T \ (yl)  = tr(j4)  and 

t2(a}  = at. 


(a) 


Find  (T\  o T2)  {A),  where  A = 


b 

d 


(b)  Can  you  find  (T2  o T\ ) (A)  ? Explain. 


Answer: 

(a)  « + <* 

(b)  (7*2  o T\ ) (j4)  does  not  exist  since  T \ ( A ) is  not  a 2 x 2 matrix. 

4.  Let  T\\Pn—>  Pn  and  T^.Pn—*  he  the  linear  operators  given  by  T\  (p  (x) ) = p (x  — 1 ) and 
?2 {p{x))=p{x  + 1).  Find  (Tj  o T2)(p(x))  and  (72  oT\)(p{x)). 

5.  Let  T\ : V —*Vbe  the  dilation  T \ (v)  = 4v.  Find  a linear  operator  T2 : V — ► V such  that  T\  o 72  = / 
7*2  o 7*!  = /. 

Answer: 

r2(v)  = 

6.  Suppose  that  the  linear  transformations  7^ : 72  — 1 * R 2 and  7’2 : P2  “ ^ ^3  are  given  by  the  formulas 


T’lCpOO)  — Pit  + 1)  and  7*2  0*00)  = xp(x).  Find  ^2  ° T\  J^o  + aix  + tf2X2)- 

7.  Let  q$  (x)  be  a fixed  polynomial  of  degree  m,  and  define  a function  T with  domain  Pn  by  the  formula 
T(p(x))  = p(qo(x)).  Show  that  T is  a linear  transformation. 

8.  Use  the  definition  of  T3  o T2  o Tj  given  by  Formula  3 to  prove  that 

(a)  T3  o Tj  o T\  is  a linear  transformation. 

(b)  73  o 7*2  o T\  = (73  o T-i)  o T\. 

(c)  73  o Tj  o T\  = 73  o (72  o T\). 

9.  Let  T.R?  —>  R~'  be  the  orthogonal  projection  of  R-‘  onto  the  xy-plane.  Show  that  T o T = 7- 

10.  In  each  part,  let  p R~  be  multiplication  by  A.  Determine  whether  T has  an  inverse;  if  so,  find 


11.  In  each  part,  let  T:RS  — » Pc'  be  multiplication  by  A.  Determine  whether  T has  an  inverse;  if  so,  find 


(a)  1 5 2“ 

A=  12  1 

-1  1 0 

(b)  14  -f 

A=  12  1 

-1  1 0 

(c)  fl  0 r 

A—  Oil 

1 1 0 

(d)  [1-1  r 

A=  0 2-1 

2 3 0 


Answer: 


(a)  T has  no  inverse. 


(b) 


T~ 1 


*1 

x2 

*3 


(c) 


T-l 


*1 

x2 

x3 


1 1 3 

gM  + ^2-^3 

8X‘  + 8X2  + 4X3 
3 5 1 

-g*l  + 8*2+4*3 

4*,  + 2X2+2X3 

2X1  + 2X2-2X3 


(d) 

*l' 

3xi  + 3X2-X3 

T-1 

x2 

= 

—2xi  — 2x^2  + X3 

x3 

—4xi  — 5x2  + 2x3 

12.  In  each  part,  determine  whether  the  linear  operator  T:  Rn  — » 7?K’  is  one-to-one;  if  so,  find 

T~l{xi,x2 xM). 

(a)  7X*1>  *2.  xM)  = (0,  xi,  x2, ....  x„_i) 

(b)  TXxi,  *2.  — . *n)  = On.  *m-1>  — . *2.  *l) 

(c)  7Xxi,  x2,  x„)  = (x2,  X3,  x„,  x\) 


13.  Let  T\Rn  — ► /?”  be  the  linear  operator  defined  by  the  formula 

T(x i,  *2,  — *w)  = (*1*1.  a2x2>  — . *w*w) 

where  a i, are  constants. 

(a)  Under  what  conditions  will  T have  an  inverse? 

(b)  Assuming  that  the  conditions  determined  in  part  (a)  are  satisfied,  find  a formula  for 

r_1(xi,X2.-,x„). 


Answer: 


(a)  a,  * 0 for  i = 1 , 2,  3 n 

<b>  r-1**!.  *2.  *3. *»)  = (37*1. 37*2.  3J*3. ...  £*») 

14.  Let  T\  :R?  —*  R?  and  Tn  .R?  —*  R?  be  the  linear  operators  given  by  the  formulas 

Tl(x,y)  = (.x+y>x-y)  and  720, .y)  = (2x+y,x-2y) 

(a)  Show  that  T \ and  T2  are  one-to-one. 

(b)  Find  formulas  for 

77‘(*,.y),  (T2oT0-'{x.y) 

(c)  Verify  that  (J2  oT{)~1  = Tf 1 o 72_1  • 


15.  Let  T\ : P2  — 5 ► and  7*2 : P3  — ► ^3  be  the  linear  transformations  given  by  the  formulas 

^1  (*(*))  =**(*)  T2(p(x))  =p(x  + 1) 

(a)  Find  formulas  for  Jj-1  (j> (x) ) , 7"1  (;>(*)),  and  (72  o 7’t)-1  O (x) ) • 

(b)  Verify  that  (72  oT{)~1  = Tf 1 o T^1  • 

Answer: 

<a)  Tf^C*))— 7y‘  0>W)  = p(x  - 1);  (TioTiy'Wx))  = £j>(*  - 1) 

16.  Let  T a F?  — * T%.R?  — » and  TfR?  — * R?'  be  the  reflections  about  the  xy-plane,  the  ^z-plane,  and 

the  yz-plane,  respectively.  Verify  Formula  8 for  these  linear  operators. 

17.  Let  T:P\  —*R2  be  the  function  defined  by  the  formula 

T(p(x))  = (p(0),p(\)) 

(a)  Find  T(1  — 2x). 

(b)  Show  that  T is  a linear  transformation. 

(c)  Show  that  T is  one-to-one. 

(d)  Find  ? 1 [2,  3 j,  and  sketch  its  graph. 

Answer: 

(a)  0.  -1) 

(d)  7-1(2,  3)  = 2 + x 

18.  Let  T:R2  —>R2  be  the  linear  operator  given  by  the  formula  T(x,  y)  = (x  ! ky,  — y ) • Show  that  T is 
one-to-one  and  that  = T f°r  every  real  value  of  k. 

19.  Prove:  If  T y _ ♦ If  is  a one-to-one  linear  transformation,  then  T~^  R(t)  — ► V is  a one-to-one  linear 
transformation. 

In  Exercises  20-21,  determine  whether  T\  o T2  = T2  o Ty 

20.  j.  , p2  rJ.  ^ the  orthogonal  projection  on  the  x-axis,  and  7%  7 2 _ +.  r}  is  the  orthogonal  projection 
on  the  y-axis. 

(b)  T{  :R2-*R2  is  the  rotation  about  the  origin  through  an  angle  9\ , and  7%  p}  _*  R 2 is  the  rotation 
about  the  origin  through  an  angle  02. 

(c)  T\  R?  —*  R~'  is  the  rotation  about  the  x-axis  through  an  angle  9\ , and  Tv.B?  —*  ^3  js  the  rotation 
about  the  z-axis  through  an  angle  02. 

21-  (a)  7*j  • r2  _ * 7 2 is  the  reflection  about  the  x-axis,  and  T2  .R2  -*R2  ’s  the  reflection  about  the  y-axis. 

(b)  T\  R2  —*  R2  is  the  orthogonal  projection  on  the  x-axis,  and  T->  .R2  —*  R2  is  the  counterclockwise 
rotation  through  an  angle  9 ■ 


(c)  T\  :P?  —*  R~'  is  a dilation  by  a factor  k,  and  J'-!  p^  p-‘  is  the  counterclockwise  rotation  about  the 
z-axis  through  an  angle  0. 


Answer: 


(a)  T\oT2  = T2oT\ 

(b)  Ti  oT2*T2oT\ 

(c)  T\oT2  = T2oT\ 


22.  ( Calculus  required)  Let 


=/'oo 


and  J f 


h: 


f 


be  the  linear  transformations  in  Examples  1 1 and  12  of  Section  8.1 . Find  ( J o D)  (f ) for 

(a)  f (x)  = x2  + 3x  + 2 

(b)  f (*)  = sin  x 

(c)  f (x)  = ex  + 3 


23.  ( Calculus  required)  The  Fundamental  Theorem  of  Calculus  implies  that  integration  and  differentiation 
reverse  the  actions  of  each  other.  Define  a transformation  D.Pn  — ► Pn-\  by  D (P  ) ) = P (x),  and 

define  J:Pn-\  — *Pn  by 


J\p(x) 


p{t)dt 


(a)  Show  that  D and  J are  linear  transformations. 

(b)  Explain  why  J is  not  the  inverse  transformation  of  D. 

(c)  Can  the  domains  and/or  codomains  of  D and  J be  restricted  so  they  are  inverse  linear  transformations? 


True-False  Exercises 


In  parts  (a)-(f)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  The  composition  of  two  linear  transformations  is  also  a linear  transformation. 
Answer: 

True 

(b)  If  T\ : V —*  V and  T2.V  — ► V are  any  two  linear  operators,  then  T\oT2  = T2oT\. 
Answer: 

False 

(c)  The  inverse  of  a linear  transformation  is  a linear  transformation. 


Answer: 


False 

(d)  if  a linear  transformation  T has  an  inverse,  then  the  kernel  of  T is  the  zero  subspace. 

Answer: 

True 

(e)  If  T:R2^R2  is  the  orthogonal  projection  onto  the  x-axis,  then  7*  1 R 1 maps  each  point  on  the 

x-axis  onto  a line  that  is  perpendicular  to  the  x-axis. 

Answer: 

False 

(I)  If  T\ : U — » V and  Tj  .V  — * W are  linear  transformations,  and  if  T j is  not  one-to-one,  then  neither  is 
T2oTv 

Answer: 

True 
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8.4  Matrices  for  General  Linear  Transformations 

In  this  section  we  will  show  that  a general  linear  transformation  from  any  ^-dimensional  vector  space  V to  any 
ra-dimensional  vector  space  W can  be  performed  using  an  appropriate  matrix  transformation  from  Rn  to  R™.  This  idea  is 
used  in  computer  computations  since  computers  are  well  suited  for  performing  matrix  computations. 


Matrices  of  Linear  Transformations 


Suppose  that  V is  an  ^-dimensional  vector  space,  W is  an  ra-dimensional  vector  space,  and  that  T:V  > W is  a linear 
transformation.  Suppose  further  that  B is  a basis  for  V,  that  B?  is  a basis  for  W,  and  that  for  each  vector  x in  V,  the 
coordinate  matrices  for  x and  T(x)  are  [x]  £ and  [T’(x)  ] £*,  respectively  (Figure  8.4.1). 


A vector 
in  V 

(n-dimenstonal) 

r 

A vector 
in*" 

Lis 

V 

Mb 


Figure  8.4.1 

It  will  be  our  goal  to  find  an  ^ x n matrix  A such  that  multiplication  by  A maps  the  vector  [x]  £ into  the  vector  [ 7*(x)  ] £f 
for  each  x in  V (Figure  8.4.2a).  If  we  can  do  so,  then,  as  illustrated  in  Figure  8.4.2  b,  we  will  be  able  to  execute  the  linear 
transformation  T by  using  matrix  multiplication  and  the  following  indirect  procedure: 


Finding  T (x)  Indirectly 


Compute  the  coordinate  vector  [x]  £. 

Multiply  [x]  £ on  the  left  by  A to  produce  [T(x)  ] £*. 
Reconstruct  T(x)  from  its  coordinate  vector  [T(x)  ] £*. 


T maps 
V into  TVr 


■V 


n 


71 J s) 


Multiplication 
by /l 

maps  R " into  Rm 


(«) 


Direct 

computation 


(1) 


-►  7\\) 


(3) 


Multiply  by  /t 

lxl« 


(b) 


Figure  8.4.2 


j 


The  key  to  executing  this  plan  is  to  find  an^x«  matrix  A with  the  property  that 


[^(*)]b'  (i) 

For  this  purpose,  let  B = {uj , 112, . . uM } be  a basis  for  the  n-dimensional  space  V and  Br  = =j  vi , V2, . . vm  [,  a basis  for 

the  m- dimensional  space  W.  Since  Equation  1 must  hold  for  all  vectors  in  V,  it  must  hold,  in  particular,  for  the  basis 

vectors  in  B ; that  is, 

^[«i]b=  [^(«i)]b'.  ^[u2]b=  [7’(u2)]S',...,  ^[u„]£  = [T(u„)]B'  (2) 

But 

’ll  To]  To 

0 1 0 

[«l]s=  0 . [»2]b=  0 [u„]b=  0 

0 0 1 


1 

0 

0 

0 


0 

1 

0 

0 

0 

0 

0 

1 


We  will  call  this  the  matrix  for  T relative  to  the  bases  B and  B and  will  denote  it  by  the  symbol  [ T]  £*£.  Using  this 
notation,  Formula  3 can  be  written  as 

[ [^T(ui)  ] [^"(112)  ] ^••|---|  ] ^*]  (4) 

and  from  1 , this  matrix  has  the  property 

\T\B'B\x]B=  \T(x)]b*  (5) 

We  leave  it  as  an  exercise  to  show  that  in  the  special  case  where  Tj±.Rn  — ► Rm  is  multiplication  by  A,  and  where  B and  Br 
are  the  standard  bases  for  Rn  and  Rm , respectively,  then 


(6) 


Observe  that  in  the  notation  [T]  the  right  subscript  is  a basis  for  the  domain  of  T,  and  the  left  subscript  is 
a basis  for  the  image  space  of  T (Figure  8.4.3).  Moreover,  observe  how  the  subscript  B seems  to  “cancel  out”  in  Formula 
5 (Figure  8.4.4). 


Basis  for  the  Basis  for  the 
image  space  domain 

Figure  8.4.3 


Cancellation 

Figure  8.4.4 


EXAMPLE  1 Matrix  for  a Linear  Transformation 

Let  T.  P\  — ► P2  be  the  linear  transformation  defined  by 

T(p(x))=xp(x) 

Find  the  matrix  for  T with  respect  to  the  standard  bases 

5 ={111,112}  ^ 5'  = {vi,  v2,  v3} 

where 

uj  = 1,  U2=x;  vi  = 1,  V2=x,  V3  = x2 


From  the  given  formula  for  T we  obtain 

7(u0  = 7(1)  = (x)(l)=* 

T(u2)  = T{x)  = (x)(x)=x2 

By  inspection,  the  coordinate  vectors  for  7(ui)  and  T(u2)  relative  to  Br  are 


'O' 

'O' 

[T(  ui)]B*  = 

1 

. [7X112)]*.= 

0 

0 

1 

Thus,  the  matrix  for  T with  respect  to  B and  B'  is 


[T]B\B=[[n*i)]B'in*2)]B']  = 


0 

1 

0 


0 

0 

1 


EXAMPLE  2 The  Three-Step  Procedure 


Let  T.P\  - be  the  linear  transformation  in  Example  1,  and  use  the  three-step  procedure  described  in 
the  following  figure  to  perform  the  computation 

„2 


T(a  + &x  J = x(a  + bx  j = ax  +bx 1 


Direct 

computation 


T(x) 


(1) 


t Multiply  by  \T\B-  B 


l*l„ 


(2) 


(3) 


ITT*)]*. 


Solution 

The  coordinate  matrix  for  x = a -f  bx  relative  to  the  basis  B = { 1 , x } is 


Multiplying  [x]  * by  the  matrix  [T]  found  in  Example  1 we  obtain 

"0  O' 

1 0 


[T]*.  b[x]b  = 


0 1 


= [T(x)]b> 


Reconstructing  T(x)  = T(a  4-  bx)  from  [T(x)  ] B>  we  obtain 

T(a  + bx  J = 0 + ax  4-  bx 2 = ax  + bx2 


Although  Example  2 is  simple,  the  procedure  that  it 
illustrates  is  applicable  to  problems  of  great 
complexity. 


EXAMPLE  3 Matrix  for  a Linear  Transformation 


Let  B B?  — » R J be  the  linear  transformation  defined  by 

*2 

1 1 xi It 

T 


DSD- 


—5xi + 13x2 

—7xi + 16x2 


0 

-5 

-7 


1 

13 

16 


*l 

x2 


Find  the  matrix  for  the  transformation  T with  respect  to  the  bases  B = {n\ , 112 } for  R2  and 
Bl  = |vi,  V2,  V3^  for  where 


'3' 

"5' 

f 

'-f 

O' 

Ul  = 

_1_ 

. u2  = 

_2_ 

; vi  = 

0 

-1 

. v2  — 

2 

2 

> v3  = 

1 

2 

From  the  formula  for  T, 


r 

2' 

n m)  = 

-2 

- T(u2)  = 

1 

-5 

-3 

Expressing  these  vectors  as  linear  combinations  of  v\,  v2,  and  V3?  we  obtain  (verify) 

T(ui)  =vi  -2v3,  T( u2)  =3vi  +V2-V3 


Thus, 


[T(ni)]B>  = 

r 

0 

. [T(u2)]B'  = 

3' 

1 

-2 

-1 

so 


[T]B'rB=[[T(ul)]B'\[n*2)]B']  = 


1 

0 

—2 


3 

1 

-1 


Example  3 illustrates  that  a fixed  linear  transformation  generally  has  multiple  representations,  each  depending 
on  the  bases  chosen.  In  this  case  the  matrices 


0 

f 

1 3" 

T 

— 

-5 

-7 

13 

16 

and  [T]b',B  = 

0 1 

-2  -1 

both  represent  the  transformation  T,  the  first  relative  to  the  standard  bases  for  and  the  second  relative  to  the  bases 
B and  Br  stated  in  the  example. 


Matrices  of  Linear  Operators 

In  the  special  case  where  Y = W (so  that  V — ► V is  a linear  operator),  it  is  usual  to  take  B = Br  when  constructing  a 
matrix  for  T.  In  this  case  the  resulting  matrix  is  called  the  matrix  for  T relative  to  the  basis  B and  is  usually  denoted  by 
[ T]  £ rather  than  [T]  £ £.  If  B = {uj , u2, . . uM ) , then  Formulas  4 and  5 become 

Phrased  informally,  Formulas  7 and  8 state  that  the 
matrix  for  T \ when  multiplied  by  the  coordinate 
vector  for  x,  produces  the  coordinate  vector  for  T(x) 


[7]b=[[T(ui)]b|[T(u2)]b|...|[T(u„)]b] 


(7) 


[T]B[x]B=[T(x)]B 


(8) 


In  the  special  case  where  T:R ” — » is  a matrix  operator,  say  multiplication  by  A,  and  B is  the  standard  basis  for  Rn, 
then  Formula  7 simplifies  to 


Recall  that  the  identity  operator  Y _►  Y maps  every  vector  in  V into  itself,  that  is,  /(x)  = x for  every  vector  x in  Y • The 
following  example  shows  that  if  V is  ^-dimensional,  then  the  matrix  for  I relative  to  any  basis  B for  V is  the  ^ x n identity 
matrix. 

EXAMPLE  4 Matrices  of  Identity  Operators 

If  B = (ui , U2, . . u„  } is  a basis  for  a finite-dimensional  vector  space  Y , and  if  /;  Y — ► V is  the  identity 
operator  on  Y , then 


[T]b=a 


(9) 


Matrices  of  Identity  Operators 


/(ui)=ui,  /(u2)  =u2,...,  /(u„)=u„ 


Therefore, 


1 0 ...  0 

0 1 ...  0 


U]b=  0 0 ...  0 =1 


0 0 ...  1 


T t t 

[/( ui)]B  [/( u2)]B  [/(  u„)]B 


EXAMPLES  Linear  Operator  on  P2 


Let  T.Pj—*  the  linear  operator  defined  by 

T(p(X))=p(  3x-5) 


Find  [ T]  £ relative  to  the  basis  5 = 1 1 , x,  x2 1 . 

Use  the  indirect  procedure  to  compute  7^1  4-  2x  -h  3x2  J. 

Check  the  result  in  (b)  by  computing  T 1 1 I 2x  } 3x*  j directly. 


Solution 


From  the  formula  for  T, 

7(l  ) = 1,  T(x)  = 3x-5,  7^r2J  = (3x  — 5)2  = 9x2  — 30x  + 25 


SO 


[T(\)]b  = 


, [T(x)]b  = 


-5 

3 

0 


• [42)L= 


25 

-30 

9 


Thus, 


[T]b  = 


1 -5  25 

0 3 -30 

0 0 9 


The  coordinate  matrix  for  p = 1 4-  2x  4-  3x2  relative  to  the  basis  B — jl,  x,  x2  j 

[p]fi  = 


is 


Multiplying  [p  ] B by  the  matrix  [ 7]  £ found  in  part  (a)  we  obtain 


[T]b[  p]B  = 


1 

-5 

251 

\ 

66' 

0 

3 

-30 

2 

= 

-84 

0 

0 

9 

3 

27 

= [T{  p)]* 


Reconstructing  7jp  j — 7^1  4 2x  + 3x2  j from  [7(p) ] £ we  obtain 
7(l  -4  2x  4 3x2  j = 66  — S4x  + 27  x2 

By  direct  computation, 

7(l  + 2x  + 3;r2)  = 1 + 2(3x  - 5)  + 3(3x  - 5)2 

= 1 4 6x  - 10  4 27x2  - 90*  475 
= 66  — 84x  4 27x2 

which  agrees  with  the  result  in  (b). 


Matrices  of  Compositions  and  Inverse  Transformations 

We  will  conclude  this  section  by  mentioning  two  theorems  without  proof  that  are  generalizations  of  Formulas  4 and  7 of 
Section  4.10. 


THEOREM  8.4.1 

If  T\ : U — ► V and  T^.V  — ► W are  linear  transformations,  and  if  B , B 9f,  and  Br  are  bases  for  U,  V , and  W, 
respectively,  then 


[7*2  O 7*1  ]s',B—  [^2]b',B"[^i]b",B 


(10) 


THEOREM  8.4.2 

If  X:  V — ► V is  a linear  operator,  and  if  5 is  a basis  for  V,  then  the  following  are  equivalent. 

(a)  T is  one-to-one. 

(b)  [ T]  £ is  invertible. 

Moreover,  when  these  equivalent  conditions  hold, 


In  10,  observe  how  the  interior  subscript  B n (the  basis  for  the  intermediate  space  V)  seems  to  “cancel  out,” 
leaving  only  the  bases  for  the  domain  and  image  space  of  the  composition  as  subscripts  (Figure  8.4.5).  This  cancellation 
of  interior  subscripts  suggests  the  following  extension  of  Formula  10  to  compositions  of  three  linear  transformations 
(Figure  8.4.6): 


[ T3  o 7*2  oT\]B'  B=  [Ti]B\B**[T2]Bf\Bft\T\]Bi\B 


(12) 


h 


Cancellation  Cancellation 

Figure  8.4.5 

7*2 


Basis  8 


Basis  8 " 


Basis  B"* 


Basis  8' 


Figure  8.4.6 


The  following  example  illustrates  Theorem  8.4.1. 

EXAMPLE  6 Composition 

Let  Ti:Pi->P2  ^the  linear  transformation  defined  by 

T\  =xp(x) 

and  let  To : Pi  — ► Pi  be  the  linear  operator  defined  by 

T2(p(x))=p(3x-5) 

Then  the  composition  (T2oT\)  :P\  — >P2  is  given  by 


(T2  O Ti)0>(*))  = T2(Tl (p(x)))  = T2(xp(x))  = (3x  - 5)p(3x  - 5) 
Thus,  if  p(x)  =c o + then 

{T2oT\){cq  + cix)  = (3x-5)(c0  + ci(3x-5)) 

= q](3x-5)  + c1(3x-5)2 


(13) 


In  this  example,  P\  plays  the  role  of  U in  Theorem  8.4.1,  and  P2  plays  the  roles  of  both  V and  W;  thus  we  can 
take  Br  = B,r  in  10  so  that  the  formula  simplifies  to 

[T2oTi]B*fB=  [T2]b*[T\]b<b  (14) 


Let  us  choose  B = { 1,  x)  to  be  the  basis  for  P±  and  choose  B*  — |l,  x,  to  be  the  basis  for  P2-  We 


showed  in  Examples  1 and  5 that 


[T\]b',b  = 

"0  O' 
1 0 

and  [T2\b'  = 

'1  -5  25' 

0 3 -30 

0 1 

0 0 9 

Thus,  it  follows  from  14  that 


[T2  o T\]b*  b = 


'1 

-5 

25' 

0 

O' 

-5 

25' 

0 

3 

-30 

1 

0 

= 

3 

-30 

0 

0 

9 

0 

1 

0 

9 

(15) 


As  a check,  we  will  calculate  [T2  o T\  ] directly  from  Formula  4.  Since  B = ( 1,  x)  , it  follows  from 
Formula  4 with  = 1 and  112  = x that 

[T2oTi]B*fB=  [[(T2oTi)(\)]B*\[(T2oTi)(x)]B']  (16) 


Using  13  yields 

(t2  O = 3x  - 5 and  (t2  o = (3^  - 5)2  = 9x2  - 30x  + 25 

From  this  and  the  fact  that  B1  — (l,  x,  x2)s  it  follows  that 

[(72070(1)]*.= 

Substituting  in  16  yields 

[72  o 7i  ] £<  g = 3 —30 

0 9 


'-5' 

25' 

3 

and  [(72o7000]b'  = 

-30 

0 

9 

-5  25' 

which  agrees  with  15. 


Concept  Review 

Matrix  for  a linear  transformation  relative  to  bases 
Matrix  for  a linear  operator  relative  to  a basis 
The  three-step  procedure  for  finding  T(x) 

Skills 

Find  the  matrix  for  a linear  transformation  T:  V — ► W relative  to  bases  of  V and  W. 

For  a linear  transformation  X:  V — ► W find  T(x)  using  the  matrix  for  T relative  to  bases  of  V and  W. 


Exercise  Set  8.4 


1.  Let  T:  72  — 1 ► P2  be  the  linear  transformation  defined  by  T(p  (x) ) = xp  (x) . 

(a)  Find  the  matrix  for  T relative  to  the  standard  bases 

S=|ui,U2,  113 1 and  Br  = |vi,  v2,  V3,  V4| 


where 

ui  = 1,  u2  = x,  113  = x2 

VI  = 1,  V2=x,  V2=x2,  v4  = x3 

(b)  Verify  that  the  matrix  [T]  g obtained  in  part  (a)  satisfies  Formula  5 for  every  vector  x = Cq  _j_  c^x  p in  Pj 


Answer: 


(a) 


0 

1 

0 

0 


0 

0 

1 

0 


0 

0 

0 

1 


2.  Let  TP2—*  P\  be  the  linear  transformation  defined  by 

7^od-tfix  + tf2x2J  = + fax  + 3a2Jx 

(a)  Find  the  matrix  for  T relative  to  the  standard  bases  S=|l,x,x2j  and  B*  = { 1 , x | for  P 2 and  . 

(b)  Verify  that  the  matrix  [T]  £{  £ obtained  in  part  (a)  satisfies  Formula  5 for  every  vector  x = cq  I c\x  1 C2X1  m?2 


3.  Let  T:  P2  — • ► P2  the  linear  operator  defined  by 

= ^o  J-a\ ^x  - 1 J =Fa2(x  “ l)2 

(a)  Find  the  matrix  for  T relative  to  the  standard  basis  B = *2  j-  for  P2. 

(b)  Verify  that  the  matrix  [T]  £ obtained  in  part  (a)  satisfies  Formula  8 for  every  vector  x = a^  + a\x  +a2X^  m P 2- 


Answer: 


(a) 


1 -1  1 

0 1 -2 

0 0 1 


4.  Let  T.R1  —*R2  be  the  linear  operator  defined  by 


CM::: 


~*2 
*2 


and  let  B=  {111,112)  be  the  basis  for  which 


ui  = 


and  U2  = 


-1 

0 


(a)  Find  [T] B. 

(b)  Verify  that  Formula  8 holds  for  every  vector  x 'mp?. 


5.  Let  T:  R?  — » R?  be  defined  by 


*1 

x2 


*1  + 2x2 
-x\ 

0 


(a)  Find  the  matrix  [ T]  £ relative  to  the  bases  B = {ui , U2 ) and  B!  = /vj,  V2,  V3  j. , 


1 



2 

ui  = 

3 

. u2 = 

4_ 

T 

2 

VI  = 

1 

. v2  = 

2 

7 

1 

0 

, v3  = 


(b)  Verify  that  Formula  5 holds  for  every  vector  in  p}. 

Answer: 

(a)  | 0 0 

4 ’ 

8 4 
3 3 

6.  Let  T P?  — » F?  be  the  linear  operator  defined  by 

T(x i,X2,*3)  = (*1  -7:3) 

(a)  Find  the  matrix  for  T with  respect  to  the  basis  B = { vj , v2,  V3  } , where 

vi  = (1,0,1),  v2  = (0,1,1),  v3  = (1,1,0) 

(b)  Verify  that  Formula  8 holds  for  every  vector  x = (x\,  x 2,  *3)  in  p-'. 

(c)  Is  T one-to-one?  If  so,  find  the  matrix  of  7'1  with  respect  to  the  basis  B. 

7.  Let  T:  P2  — * P2  tbe  linear  operator  defined  by  T{p  (x) ) = p (2x  + 1 ) , that  is, 

+ +c2tt2J  =cq  +ci  1 2x  + 1 J + c2(2x  + 1)' 

(a)  Find  [T]  £ with  respect  to  the  basis  B = -j  1,  1?  j- . 


where 


(k)  Use  the  three-step  procedure  illustrated  in  Example  2 to  compute?"  j 2 — 3x  I 4x2J. 
(c)  Check  the  result  obtained  in  part  (b)  by  computing  7^2  — 3x  I 4x2  j directly. 


Answer: 


(a) 


1 

0 

0 


1 1 
2 4 
0 4 


(b)  3 + 10*  + 16;r2 


8.  Let  T:P2  — • ► P2  the  linear  transformation  defined  by  T(p(x))  = xp(x  — 3),  that  is, 

(a)  Find  [T]  relative  to  the  bases  B = |l,  x,  x2  J>  and  B'  = ^\7x,  x2,  x2  j. 

03)  Use  the  three-step  procedure  illustrated  in  Example  2 to  compute  7"  j 1 I x — x2  J. 
(c)  Check  the  result  obtained  in  part  (b)  by  computing  T{\  \ x — x2  j directly. 


9. 


Let  vi  = 


1 

3 


and  V2  = 


, and  let 


3 

5 


be  the  matrix  for  T.R2  —>R2  relative  to  the  basis  B = { vj , V2 } . 

(a)  Find  [T(y\)]B  and  [T(y2)]B. 

(b)  FindT(vi)  andT(v2). 

Find  a formula  for  7 ^ * j 
^ Use  the  formula  obtained  in  (c)  to  compute  T 


Answer: 


(a)  [T(j i)]B  = 

(b)  T(r  1)  = i 


1 

— 2_ 

[T(v2)]b 

"-2 

. Uv2)  = 

_ 29 

18 

r 

1 

7 

■*l" 

107 

24 

_*2_ 

7 

7 

3 

5 


(d) 


19 

7 

83 

7 


10.  [3-210 

Let  A—  1 6 2 1 

-3  0 7 1 

B'  — |wi , W2,  W3 1 , where 


be  the  matrix  for  relative  to  the  bases  B = {vi , V2,  V3,  V4)  and 


0 

2 

1 

6 

1 

1 

4 

9 

VI  = 

i 

> V2  = 

-1 

, v3  = 

-1 

, V4  = 

4 

1 

-1 

2J 

2 

'o' 

—7 

[-6 

- 

W1  = 

8 

, W2  = 

8 

, W3  = 

9 

8 

1 

1 

(a)  Find  [7(vi)]b*,  [7’(v2)]£',  [T(v3)]£',  and  [7(y4)]b*. 

(b)  Find  T(vi),  7(v2),  7’(v3),  and  7(v4). 

(c) 

Find  a formula  for  T 


f[xf 

\ 

x2 

x3 

*4 

1 

(d) 


Use  the  formula  obtained  in  (c)  to  compute  T 


11. 


Let  A = 


1 3 -1 

2 0 5 

6-2  4 


be  the  matrix  for  T Pj—^P  2 with  respect  to  the  basis  B — { vi , V2,  V3 } , where 


vj  = 3x  4-  3x2>  v2  = - 1 4-  3x  + 2x2,  v3  = 3 + lx  4-  2x2- 

Find  [TCvi)]^  [7(v2)]£,  and  [T(v3)]5. 

(a)  Find  7’(vi),7’(v2),  and  T(V3). 

(b)  Find  a formula  for  T jag  +a\x  + a2x~  j. 

(c)  Use  the  formula  obtained  in  (c)  to  compute  fo  1 I x2  j. 


Answer: 


(a) 

T 

3 

-r 

[T(y  i)]jff  = 

2 

. [T(v2)]b  = 

0 

. mr2)]B  = 

5 

6 

-2 

4 

(b)  T(vi)  = 16  4*  5\x  4-  19x,  T(v 2)  = — 6 — 5*  4- 5x  , T(v3)  =7 + 40t: -h  15^^ 


(c) 

(d)  T 


a$  + a\x  ^C12X 


239an  — 161^i  4-  289^9  , 20 lc^n  — lll^i  4=  247^^  , 6 l^n  — 31ai  + \Ola7 

x 1 


24 


8 


12 


22  + 56x  + 14*" 


12.  Let  T\ : P i — * P2  be  the  linear  transformation  defined  by 

T’iOK*))  =xpC0 

and  let  T2 : 7^2  ~ ^ linear  operator  defined  by 


T2(p(x))=p(  2X  + 1) 

Let  B = { 1,  x)  and  = |l,  x,  x2  J>  be  the  standard  bases  for  P\  and 

(a)  Find  [73  ° ^1  ] B',B>  Ul  ] and 

(b)  State  a formula  relating  the  matrices  in  part  (a). 

(c)  Verify  that  the  matrices  in  part  (a)  satisfy  the  formula  you  stated  in  part(b). 

13.  Let  T\.P\  — ► P2  he  the  linear  transformation  defined  by 

T\  Oo  + ci*)  = 2c  0 — 3c; i* 

and  let  73 : 72  — ► 73  he  linear  transformation  defined  by 

^2^0  +^1*  + C2X2J  = 3cox  + 3cjx2  + 3c2x3 

Let  5=  {1  ,x),B"  = |l,ar,x2|,and  B'  = |l,x,  x2,  x3|. 

(a)  Find  [T2°T\]B>B,  and  [T\]B»B. 

(b)  State  a formula  relating  the  matrices  in  part  (a). 

(c)  Verify  that  the  matrices  in  part  (a)  satisfy  the  formula  you  stated  in  part(b). 

Answer: 

(a)  r°  0]  To  0 0" 

[T2°T,)B.'B=  * J , [T2]B’,B*=  III’  = 

0 0 0 0 3 

(b)  [^2°^1  ]b',B= 

14.  Show  that  if  J9:  V — ► W is  the  zero  transformation,  then  the  matrix  for  T with  respect  to  any  bases  for  V and  W is  a zero 
matrix. 

15.  Show  that  if  T:  V — ► V is  a contraction  or  a dilation  of  V (Example  4)  of  Section  8.1),  then  the  matrix  for  T relative  to 
any  basis  for  V is  a positive  scalar  multiple  of  the  identity  matrix. 

16.  Let  B = { vi , V2,  V3,  V4)  be  a basis  for  a vector  space  V.  Find  the  matrix  with  respect  to  B of  the  linear  operator 
7:  V ->  V defined  by  T(y\)  =v2,  7(v2)  = v3,  7(v3)  = v4,  7(v4)  =vi- 

17.  Prove  that  if  B and  Br  are  the  standard  bases  for  R}}  and  Rm , respectively,  then  the  matrix  for  a linear  transformation 
T:Rn  — ► R™  relative  to  the  bases  B and  B'  is  the  standard  matrix  for  T. 

18.  (Calculus  required)  Let  D:  P2  — ► P2  he  the  differentiation  operator  D (P  ) = P*(x).  In  parts  (a)  and  (b),  find  the 
matrix  of  D relative  to  the  basis  B=  (p  1 , P2,  P3 } • 

(a)  PI  = 1,  P2  = X*  P3  = * 

(b)  pi  = 2,  p2  = 2 — 3x,  p3  = 2 — 3x  4°  8x2 

(c)  Use  the  matrix  in  part  (a)  to  compute  7-1  j 6 — 6x  I 24* 2 J. 

(d)  Repeat  the  directions  for  part  (c)  for  the  matrix  in  part  (b). 

19.  (Calculus  required)  In  each  part,  suppose  that  B = {f  1,  f 2,  f 3}  is  a basis  for  a subspace  V of  the  vector  space  of 
real-valued  functions  defined  on  the  real  line.  Find  the  matrix  with  respect  to  B for  differentiation  operator  D\V  >V- 

(a)  f 1 = 1 , f 2 = sin  x,  f 3 = cos  x 

(b)  f1  = l,  f2  = ex,  i2  = e2x 

(c)  f1=e2x,  f2  = xe2x,  f3  = x2e2x 


w Use  the  matrix  in  part  (c)  to  compute  D\4e  + 6xe  — 10x  e \ 


Answer: 


(a) 


(b) 


(c) 


0 0 

2 1 
0 2 
0 0 


0 

-1 

0 

0 

0 

2 

0 

2 

2 


(d) 

"2  1 O' 

4' 

14' 

\4e2x  - Sxe2x  - 20x2e2x  since 

0 2 2 

6 

= 

-8 

0 0 2 

-10 

-20 

20.  Let  V be  a four-dimensional  vector  space  with  basis  B,  let  W be  a seven-dimensional  vector  space  with  basis  B' , and  let 
T .V  — ► W be  a linear  transformation.  Identify  the  four  vector  spaces  that  contain  the  vectors  at  the  comers  of  the 
accompanying  diagram. 

Direct 


(1) 


1*1- 


compulation 

Multiply  by  |'/')s.  B 

(2) 


(3) 

in*)i» 


Figure  Ex-20 


21.  In  each  part,  fill  in  the  missing  part  of  the  equation. 

(a)  [?2  ° T\]b',B=  IT2]  -2-  [T\]b",B 

(b)  [?3  o 7*2  o 7*i  ] g>g  = [T3]  JL[7’2]  s'"  B" ,B 

Answer: 

(a)  B',  B" 

(b)  B',  B'" 

True-False  Exercises 


In  parts  (a)-(e)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 


fa) 

v } If  the  matrix  of  a linear  transformation  X:  V 
vector  x in  V such  that  T(x)  = 2x. 

Answer: 


W relative  to  some  bases  of  V and  W is 


2 


4 

3 ’ 


then  there  is  a nonzero 


False 


If  the  matrix  of  a linear  transformation  X \V  — ► W relative  to  bases  for  V and  W is 
vector  x in  V such  that  T (x)  = 4x. 


, then  there  is  a nonzero 


(b) 


2 4 
0 3 


Answer: 

False 

(c) 

v J If  the  matrix  of  a linear  transformation  X\  V 


W relative  to  certain  bases  for  V and  W is 


4 

3 ’ 


then  T is  one-to-one. 


Answer: 

True 

(d)  If  £ y _ ► V and  X V — ► V are  linear  operators  and  B is  a basis  for  V,  then  the  matrix  of  £ 0 X relative  to  B is 
[T]b[S]b. 


Answer: 

False 

(e)  If  X\  V — ► V is  an  invertible  linear  operator  and  5 is  a basis  for  V,  then  the  matrix  for  77-1  relative  to  5 is  [ X]  ^ • 


Answer: 

True 
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8.5  Similarity 

The  matrix  for  a linear  operator  T:  V-^V  depends  on  the  basis  selected  for  V.  One  of  the  fundamental  problems  of  linear 
algebra  is  to  choose  a basis  for  V that  makes  the  matrix  for  T as  simple  as  possible — a diagonal  or  a triangular  matrix,  for 
example.  In  this  section  we  will  study  this  problem. 


Simple  Matrices  for  Linear  Operators 


Standard  bases  do  not  necessarily  produce  the  simplest  matrices  for  linear  operators.  For  example,  consider  the  matrix 
operator  T.B?  — ► B?  whose  standard  matrix  is 

11 
-2  4 


(1) 


and  view  [T]  as  the  matrix  for  T relative  to  the  standard  basis  B — {ej,  e2)  for  p}.  Let  us  compare  this  to  the  matrix  for 
T relative  to  the  basis  B'  = 1 ui  , u2 '}  for  p}  in  which 


f 

T 

t 

T 

U1  = 

l 

, u2  = 

2 

(2) 


tU\  = 


1 f 

T 

'2' 

= 2ut  and  tIvL  ] = 

1 f 

V 

'3' 

2 4. 

_l_ 

_2_ 

w 

-2  4_ 

_2_ 

_6_ 

Since 


it  follows  that 

P("S)]E.= 

so  the  matrix  for  T relative  to  the  basis  B?  is 


= 3\ln 


[7(u2)L'  = 


2 0 
0 3 


This  matrix,  being  diagonal,  has  a simpler  form  than  [ T]  and  conveys  clearly  that  the  operator  T scales  u j by  a factor  of  2 
and  iT,  by  a factor  of  3,  information  that  is  not  immediately  evident  from  [T] . 

One  of  the  major  themes  in  more  advanced  linear  algebra  courses  is  to  determine  the  “simplest  possible  form”  that  can  be 
obtained  for  the  matrix  of  a linear  operator  by  choosing  the  basis  appropriately.  Sometimes  it  is  possible  to  obtain  a 
diagonal  matrix  (as  above,  for  example),  whereas  other  times  one  must  settle  for  a triangular  matrix  or  some  other  form. 

We  will  only  be  able  to  touch  on  this  important  topic  in  this  text. 

The  problem  of  finding  a basis  that  produces  the  simplest  possible  matrix  for  a linear  operator  X:  V — ► V can  be  attacked  by 
first  finding  a matrix  for  T relative  to  any  basis,  typically  a standard  basis,  where  applicable,  and  then  changing  the  basis  in 
a way  that  simplifies  the  matrix.  Before  pursuing  this  idea,  it  will  be  helpful  to  revisit  some  concepts  about  changing  bases. 


A New  View  of  Transition  Matrices 

Recall  from  Formulas  7 and  8 of  Section  4.6  that  if  B = {uj , U2 u„}  and  . u2  > • • ->  un  } are  bases  for  a vector 

space  V,  then  the  transition  matrices  from  B to  Bf  and  from  Br  to  B are 


PB^B'  = [[ui]b1[u2]*1-|[u„]B''] 


(3) 


*V-,*=[|X l]B|[u2]B|  •|[U«]B]  (4) 

where  the  matrices  Pg  .g*  and  Pg*  fg  are  inverses  of  each  other.  We  also  showed  in  Formulas  9 and  10  of  that  section 
that  if  v is  any  vector  in  V,  then 


pB^,B'\y^B=  Mb' 

(5) 

?b'-»bWb'=  M b 

(6) 

The  following  theorem  shows  that  transition  matrices  in  Formulas  3 and  4 can  be  viewed  as  matrices  for  identity  operators. 


THEOREM  8.5.1 

If  B and  B?  are  bases  for  a finite-dimensional  vector  space  V,  and  if  /;  y V is  the  identity  operator  on  V,  then 

P B— »B'  = U]  B\B  ^ Pg'->g  = V]  B,Bf 

Suppose  that  B = {uj , 112, . . u„ } and  B*  — |ui , X are  bases  for  V.  Using  the  fact  that  / (v)  = v for  all 

v in  V,  it  follows  from  Formula  4 of  Section  8.4  that 

= [ [^(ui)]b'|[^(u2)]b#|---|[^(um)]b#] 

= [[ui]5'|[u2]b'|— |[u„]5'] 

= P [Formula  (3)  above ] 

The  proof  that  [1]  = Pb*^>B  is  similar. 

Effect  of  Changing  Bases  on  Matrices  of  Linear  Operators 

We  are  now  ready  to  consider  the  main  problem  in  this  section. 


PROBLEM 

If  B and  B*  are  two  bases  for  a finite-dimensional  vector  space  V,  and  if  J9:  y _►  y is  a linear  operator,  what 
relationship,  if  any,  exists  between  the  matrices  [T]  g and  [T]  g*l 


The  answer  to  this  question  can  be  obtained  by  considering  the  composition  of  the  three  linear  operators  on  V pictured  in 
Figure  8.5.1. 


Basis  = B' 


Basis  = B 


Basis  = B 


Basis  B' 


Figure  8.5.1 

In  this  figure,  v is  first  mapped  into  itself  by  the  identity  operator,  then  v is  mapped  into  7Xv)  by  T , and  then  ^(v)  is 
mapped  into  itself  by  the  identity  operator.  All  four  vector  spaces  involved  in  the  composition  are  the  same  (namely,  V ),  but 
the  bases  for  the  spaces  vary.  Since  the  starting  vector  is  y and  the  final  vector  is  T(v),  the  composition  produces  the  same 
result  as  applying  T directly;  that  is, 


T=1  o T o I (7) 

If,  as  illustrated  in  Figure  8.5.1,  if  the  first  and  last  vector  spaces  are  assigned  the  basis  Bf  and  the  middle  two  spaces  are 
assigned  the  basis  B , then  it  follows  from  7 and  Formula  12  of  Section  8.4  (with  an  appropriate  adjustment  to  the  names  of 
the  bases)  that 


[T]b',B,=  [IoToI]B',b'=  17]  B' ',b\T\  B,bU]  b,b' 


(8) 


or,  in  simpler  notation, 


(9) 


We  can  simplify  this  formula  even  further  by  using  Theorem  8.5.1  to  rewrite  it  as 


[ t\  £•• = Pb^,b!  t r]  b^b'-iB 


(10) 


In  summary,  we  have  the  following  theorem. 


THEOREM  8.5.2 

Let  X:  V — ► V be  a linear  operator  on  a finite-dimensional  vector  space  V,  and  let  B and  Bf  be  bases  for  V.  Then 

[T]B'  = P~\T]bP  (11) 


where  P = Pb'-+B  and  P 1 = Pb~*B{' 


When  applying  Theorem  8.5.2,  it  is  easy  to  forget  whether  P = P%*  ^ (correct)  or  P = Pg  (incorrect).  It 
may  help  to  use  the  diagram  in  Figure  8.5.2  and  observe  that  the  exterior  subscripts  of  the  transition  matrices  match  the 
subscript  of  the  matrix  they  enclose. 

[t]b.  = bb_^b.[t]bpb 

\ 

Exterior  subscripts 


Figure  8.5.2 


In  the  terminology  of  Definition  1 of  Section  5.2,  Theorem  8.5.2  tells  us  that  matrices  representing  the  same  linear  operator 
relative  to  different  bases  must  be  similar.  The  following  theorem  is  a rephrasing  of  Theorem  8.5.2  in  the  language  of 
similarity. 


THEOREM  8.5.3 

Two  matrices,  A and  B,  are  similar  if  and  only  if  they  represent  the  same  linear  operator.  Moreover,  if  B = P~^AP, 
then  P is  the  transition  matrix  from  the  basis  relative  to  matrix  B to  the  basis  relative  to  matrix  A. 


EXAMPLE  1 Similar  Matrices  Represent  the  Same  Linear  Operator 


We  showed  at  the  beginning  of  this  section  that  the  matrices 

2 0“ 

0 3_ 

represent  the  same  linear  operator  T.P?  — * R?-  Verify  that  these  matrices  are  similar  by  finding  a matrix  P for 
which  D = P~lCP- 


C = 


1 1 
-2  4 


and  D = 


We  need  to  find  the  transition  matrix 

where  . u2  } is  the  basis  for  p-  given  by  2 and  B = {ej,  e2)  is  the  standard  basis  for  p}.  We  see  by 

inspection  that 

Uj  = ei  + e2 

u2  =el  +2e2 


from  which  it  follows  that 


1 

1 


and  [u']B 


1 

2 


Thus, 


P = PB'^B=  [[U1]B  [u2]B1  = 


1 1 
1 2 


We  leave  it  for  you  to  verify  that 


and  hence  that 


o 

CM 

2 -1' 

r 1 n n n 

.0  3_ 

-1  1 

-2  4 J [l  2_ 

D P~{  C P 


Similarity  Invariants 


Recall  from  Section  5.2  that  a property  of  a square  matrix  is  called  a similarity  invariant  if  that  property  is  shared  by  all 
similar  matrices.  In  Table  1 of  that  section  (table  reproduced  below),  we  listed  the  most  important  similarity  invariants. 
Since  we  know  from  Theorem  8.5.3  that  two  matrices  are  similar  if  and  only  if  they  represent  the  same  linear  operator 
V — ► V->  it  follows  that  if  B and  Br  are  bases  for  V,  then  every  similarity  invariant  property  of  [T]g  is  also  a similarity 
invariant  property  of  [ T]  £}  for  any  other  basis  B ' for  V.  For  example,  for  any  two  bases  B and  B'  we  must  have 

det(  [T]g)  = det(  [ T]  %*) 

It  follows  from  this  equation  that  the  value  of  the  determinant  depends  on  T,  but  not  on  the  particular  basis  that  is  used  to 
obtain  the  matrix  for  T.  Thus,  the  determinant  can  be  regarded  as  a property  of  the  linear  operator  T;  indeed,  if  V is  a finite- 
dimensional  vector  space,  then  we  can  define  the  determinant  of  the  linear  operator  T to  be 

detOO  = det([7]B)  (12) 


where  B is  any  basis  for  V. 


Similarity  Invariants 


Property 

Description 

Determinant 

A and  P~^AP  have  the  same  determinant. 

Invertibility 

A is  invertible  if  and  only  if  P~^AP  is  invertible. 

Rank 

A and  p ~^AP  have  the  same  rank. 

Nullity 

A and  p ~^AP  have  the  same  nullity. 

Trace 

A and  p ~^AP  have  the  same  trace. 

Characteristic 

polynomial 

A and  P~^AP  have  the  same  characteristic  polynomial. 

Eigenvalues 

A and  P~^AP  have  the  same  eigenvalues. 

Eigenspace 

dimension 

If  A is  an  eigenvalue  of  A and  AP,  then  the  eigenspace  of  A corresponding  to  \ and  the 

eigenspace  of  P~^AP  corresponding  to  A have  the  same  dimension. 

EXAMPLE  2 Determinant  of  a Linear  Operator 


At  the  beginning  of  this  section  we  showed  that  the  matrices 

r]  = [-2  4] 


2 0 
0 3 


represent  the  same  linear  operator  relative  to  different  bases,  the  first  relative  to  the  standard  basis  B = {ej,  e2) 
for  and  the  second  relative  to  the  basis  = |ui . u2  } for  which 


This  means  that  [T]  and  [T]  £>  must  be  similar  matrices  and  hence  must  have  the  same  similarity  invariant 
properties.  In  particular,  they  must  have  the  same  determinant.  We  leave  it  for  you  to  verify  that 


1 1 

-2  4 


= 6 and  det[T]g'  = 


2 0 
0 3 


= 6 


det 


EXAMPLE  3 Eigenvalues  and  Bases  for  Eigenspaces 


Find  the  eigenvalues  and  bases  for  the  eigenspaces  of  the  linear  operator  T.  Pj  — • ► P2  defined  by 
T(a  4=  bx  + cx2  J = — 2c  (a  =F  2b  -Fcjx  + ^ 3c  Jx2 


We  leave  it  for  you  to  show  that  the  matrix  for  T with  respect  to  the  standard  basis 
B=  jl.x.x2)  is 


[ T]b  = 


0 

1 

1 


0 

2 

0 


-2 

1 

3 


The  eigenvalues  of  T are  A = ] and  A = 2 (Example  7 of  Section  5.1).  Also  from  that  example,  the 
eigenspace  of  [T]g  corresponding  to  A = 2 has  the  basis  {uj,  U2)  , where 


U1  = 

'-r 

0 

> u2 = 

"o' 

1 

1 

0 

and  the  eigenspace  of  [ T]  £ corresponding  to  A = 1 has  the  basis  (113 ) , where 

-2" 

113  = 1 

1 


The  matrices  uj,  113,  and  113  are  the  coordinate  matrices  relative  to  B of 

Pl  = — 1 -h  x2,  P2  = x,  P3=  “2  + x + ;r2 

Thus,  the  eigenspace  of  T corresponding  to  A = 2 has  the  basis 

jpi,P2j  = {“  1 +x2,*} 

and  that  corresponding  to  A = 1 has  the  basis 

j p3  } = j-2+x+;c2j 

As  a check,  you  can  use  the  given  formula  for  T to  verify  that 

7XPl)=2pi,  7Xp2)  = 2p2.  and  7’(p3)=p3 


Concept  Review 

Similarity  of  matrices  representing  a linear  operator 
Similarity  invariant 
Determinant  of  a linear  operator 

Skills 

Show  that  two  matrices  A and  B represent  the  same  linear  operator,  and  find  a transition  matrix  P so  that 
B = P~XJ. IP- 


Find  the  eigenvalues  and  bases  for  the  eigenspaces  of  a linear  operator  on  a finite-dimensional  vector  space. 


Exercise  Set  8.5 


In  Exercises  1-7,  find  the  matrix  for  T relative  to  the  basis  B , and  use  Theorem  8.5.2  to  compute  the  matrix  for  T relative 
to  the  basis  Bf . 

1.  T'.F?  — ► R2  is  defined  by 


and  5=  {111,112}  and  B*  = |vi,  V2^,  where 


(Tz»l 

)- 

XI  - 2X2 

b\ 

I 

-X2 

T 

"o' 

"2" 

-3" 

ui  = 

0 

> u2 = 

1 

; vi  = 

1 

> v2  — 

4 

Answer: 

3 

56 

11 

CQ 

"1  -2" 
0 -1 

II 

Cq 

11 

2 

11 

3 

11 

11 

/po 

*1  +7x2 

b\ 

J" 

1 

OJ 

* 

1 

* 

to 

2.  T:R2^R2  is  defined  by 


and  5=  {111,112}  and  Br  = |vi,  V2|,  where 

ui  = 

3.  T.R?  — ► R 2 is  the  rotation  about  the  origin  through  an  angle  of  45°;  B and  Bl  are  the  bases  in  Exercise  1. 
Answer: 


"2" 

4" 

t 

-r 

_2_ 

. u2  = 

_-l_ 

. vi  = 

_3_ 

, V2  = 

_-i_ 

1 

1 

13 

25 

> 

. [T]b>  = 

ll/2 

ll/2 

1 

1 

5 

9 

ll/2 

ll/2 

[T]b  = 

4.  T ^3  ^ defined  by 


pi" 

\ 

XI  + 2x2  — X3 

7 

*2 

= 

-X2 

*3 

x\  +7x3 

and  B is  the  standard  basis  for  and  B*  = |vi,  V2,  v3|,  where 


T 

"1" 

"1" 

0 

0 

, v2  = 

1 

0 

> V3  = 

1 

1 

vi  = 


5.  T P?  — » R~'  is  the  orthogonal  projection  on  the  xjy-plane,  and  B and  B'  are  as  in  Exercise  4. 


Answer: 


0 

0 
1 

0 

0 

[T]b= 

0 1 0 
0 0 0 

II 

oq 

Ui 

0 1 1 
0 0 0 

6 'TP?  —*  P?  >s  defined  by  7Xx)  = 5x,  and  B and  Br  are  the  bases  in  Exercise  2. 

l.T  P\  —*  P\  is  defined  by  T(a$+a\x)  =<2q  4-tfi(x  + 1),  and  5=  {p i , P2)  and  B1  = jqi,  q2  j,  where  pi 
, P2  = 10  + 2x,  qi  = 2,  q2  = 3 + 2x. 


Answer: 


[T)b  = 


2 

3 

1 

2 


[T]b> 


1 '1 

0 lj 


8.  Find  det(^). 

(a)  T.R2->P2,  where  T(x\,X2)  = (3xi  -4x2,  -*i+7x2) 

(b)  TP2—*  P2->  where  T{x\,  x2,  X3)  = (xj  -x2,  x2 -x3,  x3 -xi) 


(c)  T.P2->P2,  where  7(/>(x))  = p(x-  1) 


9.  Prove  that  the  following  are  similarity  invariants: 

(a)  rank 

(b)  nullity 

(c)  invertibility 


10.  Let  T. P4  — > be  the  linear  operator  given  by  the  formula  T(p(x))  = p{ 2*  + 1). 

(a)  Find  a matrix  for  T relative  to  some  convenient  basis,  and  then  use  it  to  find  the  rank  and  nullity  of  T. 

(b)  Use  the  result  in  part  (a)  to  determine  whether  T is  one-to-one. 


11.  In  each  part,  find  a basis  for  relative  to  which  the  matrix  for  T is  diagonal. 

(a) 


(b) 


fT*il 

)_ 

N 

r 

2*i  +4*2 

fbf 

4xi  ~x2 

hi 

r 

-3xi  + *2 

Answer: 


1:11  B’  = | 

[.;]■  [-’]} 

(b)  , 

-3-/2T 

5'  = 

6 

l 

1 

—3  + ^ ~2\ 
6 
1 


6 + 3* 


12.  In  each  part,  find  a basis  for  R?  relative  to  which  the  matrix  for  T is  diagonal. 


(a) 


(b) 


( 

■*l' 

\ 

T | 

*2 

{ 

/3_ 

/ 

f 

"*l" 

\ 

r| 

x2 

L, 

x3 

/ 

(c) 


*2 

IW 

/ 

-2xi+  X2-  *3 
xi  -2x2-  *3 
— xi  — X2  - 2x3 
-x2+x3_ 

—xi  + x3 
xi  *x2 

4xi  + *3 
2xi  + 3x2  + 2x3 
xi  +4x3 


13.  Let  T:  Pi  — i > Pj  be  defined  by 

T(ctQ  + a\x +a2X2'j  = (5ao  + + 2,32) 

— ^i  + 832^  + ~ 232)x2 

(a)  Find  the  eigenvalues  of  T. 

(b)  Find  bases  for  the  eigenspaces  of  T. 


Answer: 


(a)  A=  -4.  A = 3 

(b)  Basis  for  eigenspace  corresponding  to  A = — 4 
A = 3:  5 — 2x  + x2 


o 2 

— 2 4-  y*  -f  x ; basis  for  eigenspace  corresponding  to 


14.  Let  T : Mji  — ♦ M 22  be  defined  by 

2c  a 4=c 
b — 2c  d 

(a)  Find  the  eigenvalues  of  T. 

(b)  Find  bases  for  the  eigenspaces  of  T. 

15.  Let  \ be  an  eigenvalue  of  a linear  operator  T:  V — ♦ V-  Prove  that  the  eigenvectors  of  T corresponding  to  \ are  the 
nonzero  vectors  in  the  kernel  of  \[  — J\ 

(a)  Prove  that  if  A and  B are  similar  matrices,  then  A2  and  g2  are  also  similar.  More  generally,  prove  that  A*  and  5* 
are  similar  if  k is  any  positive  integer. 

(b)  If  A2  and  g2  are  similar,  must  A and  B be  similar?  Explain. 


17.  Let  C and  D be  ^ x n matrices,  and  let  B = {v\,  V2, v„)  be  a basis  for  a vector  space  V.  Show  that  if 
C[x]  g = D[x]g  for  all  x in  V,  then  C = D- 

18.  Find  two  nonzero  2x2  matrices  that  are  not  similar,  and  explain  why  they  are  not. 

19.  Complete  the  proof  below  by  justifying  each  step. 


Hypothesis:  A and  B are  similar  matrices. 

Conclusion:  A and  B have  the  same  characteristic  polynomial. 
Proof: 

1-  det^A/  — = det^A/  — P~^AP^ 


2.  =det(AP_1/>  — 

3-  = det(p_1(A/-^)p) 

4.  =det(p_1)det(A/-^)det(p) 

5-  =det(P"1)det(p)det(,V-J4) 

6 = det(A / — A) 

20.  If  A and  B are  similar  matrices,  say  B = P -1 AP,  then  it  follows  from  Exercise  19  that  A and  B have  the  same 
eigenvalues.  Suppose  that  \ is  one  of  the  common  eigenvalues  and  x is  a corresponding  eigenvector  of  A.  See  if  you  can 
find  an  eigenvector  of  B corresponding  to  A (expressed  in  terms  of  A,  x,  and  P). 

21.  Since  the  standard  basis  for  Rn  is  so  simple,  why  would  one  want  to  represent  a linear  operator  on  Rn  in  another  basis? 
Answer: 

The  choice  of  an  appropriate  basis  can  yield  a better  understanding  of  the  linear  operator. 

22.  Prove  that  trace  is  a similarity  invariant. 

True-False  Exercises 

In  parts  (a) — (h)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  A matrix  cannot  be  similar  to  itself. 

Answer: 

False 

(b)  If  A is  similar  to  B , and  B is  similar  to  C,  then  A is  similar  to  C. 

Answer: 

True 

(c)  If  A and  B are  similar  and  B is  singular,  then  A is  singular. 

Answer: 

True 

(d)  If  A and  B are  invertible  and  similar,  then  A -1  and  B are  similar. 

Answer: 

True 

(e)  If  T\ : R n — > Rn  and  7^ : Rn  — ► Rn  are  linear  operators,  and  if  [ P\  ] b{ ,B  = [ ^2  ] B'',B  with  respect  to  two  bases  B and  Bf 
for  R”,  then  T \ (x)  = 7*2 (x)  for  every  vector  x in  Rn  . 

Answer: 

True 

(f)  If  T\  :R”  — ► Rn  is  a linear  operator,  and  if  [7^1]^=  [ T\  ] g*  with  respect  to  two  bases  B and  B*  for  Rn,  then  B = Br • 


Answer: 


False 

(g)  If  T:Rn  — ► Rn  is  a linear  operator,  and  if  [T]g  = ln  with  respect  to  some  basis  B for  Rn,  then  T is  the  identity  operator 
on  Rn. 

Answer: 

True 

(h)  If  T:  Rn  — ► Rn  is  a linear  operator,  and  if  [T]  B'\B  = with  respect  to  two  bases  B and  B'  for  Rn,  then  T is  the  identity 
operator  on  Rn. 

Answer: 

False 
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Supplementary  Exercises 


1.  Let  A be  an  n x n matrix,  B a nonzero  nxn  matrix,  and  x a vector  in  Rn  expressed  in  matrix  notation.  Is 
7(x)  = Ax.  + B a linear  operator  on  7”?  Justify  your  answer. 

Answer: 


No.  7(xi  +X2)  =^(xi  +X2)  + 7*  (-(4xi  + 7)  + (-(4x2  + 7)  = 7(xi)  + 7(x2),and ifc*  l,then 
7(cx)  = cAx  + B *c(Ax  + B)  = c7(x)  . 


2.  Let 


—sin  9 
cos  9 


(a)  Show  that 


cos20 

— sin20 

1 a2 

cos30 

— sin30 

sin20 

cos20 

and  A = 

sm39 

cos30 

(b)  Based  on  your  answer  to  part  (a),  make  a guess  at  the  form  of  the  matrix  An  for  any  positive  integer  n. 

(c)  By  considering  the  geometric  effect  of  multiplication  by  A,  obtain  the  result  in  part  (b)  geometrically. 


3.  Let  7 \V  — » V be  defined  by  7(v)  = ||v||v.  Show  that  T is  not  a linear  operator  on  V. 

4.  Let  vi,  V2, vm  be  fixed  vectors  in  Rn,  and  let  T.Rn  —*  Rm  be  the  function  defined  by 
7(x)  = (x  ■ vi,  x • V2, x ■ vm),  where  x • v,  is  the  Euclidean  inner  product  on  Rn. 

(a)  Show  that  T is  a linear  transformation. 

(b)  Show  that  the  matrix  with  row  vectors  vi , V2,  - - -,  vm  is  the  standard  matrix  for  T. 

5.  Let  (ei,  e2,  e3,  04)  be  the  standard  basis  for  R4,  and  let  T:R4  — > R~'  be  the  linear  transformation  for 
which 

7(eO  = (1,2,1),  7(e2)  = (0.1,0). 

7(e3)  = d,3,0),  7(84)  = (1. 1. 1) 

(a)  Find  bases  for  the  range  and  kernel  of  T. 

(b)  Find  the  rank  and  nullity  of  T. 


Answer: 


(a)  7(e3)  and  any  two  of  7(ej),  7(e2),  and  7(e4)  form  bases  for  the  range;  ( — 1,  1,0,  1)  is  a basis 
for  the  kernel. 

(b)  Rank  = 3,  nullity  = 1 


6.  Suppose  that  vectors  in  R-'  are  denoted  by  1 x 3 matrices,  and  define  T.R?  —*  R.  by 

-1  2 4 

7(|>i  x2  x3])  = [xi  x2  x3]|  3 0 1 

2 2 5 


(a)  Find  a basis  for  the  kernel  of  T. 

(b)  Find  a basis  for  the  range  of  T. 

7.  Let  B = { vi , V2,  V3,  V4}  be  a basis  for  a vector  space  V,  and  let  f/-  _>  p-  be  the  linear  operator  for 
which 

T(vi)  = vi  + V2  + V3  + 3v4 
7’(v2)  = vj  — V2  4-  2v3  4-  2v4 

7\v3)  = 2vj  — 4v2  + 5v3  + 3v4 

T(v4)  = — 2vi  + 6V2  — 6V3  — 2v4 

(a)  Find  the  rank  and  nullity  of  T. 

(b)  Determine  whether  T is  one-to-one. 


Answer: 


(a)  Rank(t)  = 2 and  nuUity(t)  = 2 

(b)  T is  not  one-to-one. 

8.  Let  V and  W be  vector  spaces,  let  T,  T\ , and  T2  be  linear  transformations  from  V to  W,  and  let  k be  a scalar. 
Define  new  transformations,  TyA-T2  and  £7\  by  the  formulas 

(Ty  + T2)(x)  = Ty(x)  + T2(x) 

(kT)  (x)  =k(T(x)) 

(a)  Show  that  ( T\  +T2)  :V  — » W and  kT:  V — * W are  both  linear  transformations. 

(b)  Show  that  the  set  of  all  linear  transformations  from  V to  W with  the  operations  in  part  (a)  is  a vector 
space. 

9.  Let  A and  B be  similar  matrices.  Prove: 

(a)  A T and  B T are  similar. 

(b)  If  A and  B are  invertible,  then  A -1  and  B -1  are  similar. 


10.  Fredholm  Alternative  Theorem  Let  T:V  ■ V be  a linear  operator  on  an  ^-dimensional  vector  space. 
Prove  that  exactly  one  of  the  following  statements  holds: 

(i)  The  equation  T(x)  = b has  a solution  for  all  vectors  b in  V. 

(ii)  Nullity  of  7’>0- 


11.  Let  T:  M 22 


il^22  be  the  linear  operator  defined  by 


700  = 


1 

0 


x+x 


0 

1 


0 

1 


Find  the  rank  and  nullity  of  T. 


Answer: 

Rank  = 3,  nullity  = 1 

12.  Prove:  If  A and  B are  similar  matrices,  and  if  B and  C are  also  similar  matrices,  then  A and  C are  similar 
matrices. 


13*  Let  L . M 22  • M22  be  the  linear  operator  that  is  defined  by  B ( M J — M ‘ . Find  the  matrix  for  L with 

respect  to  the  standard  basis  for  M 22- 

Answer: 


"l  0 0 O' 

0 0 10 
0 10  0 
0 0 0 1 

14.  Let  B = (ui , U2,  U3 } and  B!  = <j  vi , V2,  V3  j.  be  bases  for  a vector  space  V,  and  let 

2 -1  3' 

P=  1 14 

0 1 2 

be  the  transition  matrix  from  B!  to  B. 

(a)  Express  vj,  V2,  V3  as  linear  combinations  of  uj,  U2,  U3. 

(b)  Express  uj,  U2,  U3  as  linear  combinations  of  vj,  V2,  V3. 

15.  Let  B = (uj , U2,  U3 } be  a basis  for  a vector  space  V,  and  let  T.  V — ► V be  a linear  operator  for  which 

'-3  4 T 

[T]b=  1 0 -2 

0 1 0 

Find  [T]  g>,  where  B'  = J v 1 , V2,  V3  j,  is  the  basis  for  V defined  by 

vi=ui,  V2  = ui+U2,  V3  = ui+U2+U3 


Answer: 

-4  0 9" 

[T]b>=  1 0 -2 

0 1 1 

16.  Show  that  the  matrices 


are  similar  but  that 


are  not. 

17.  Suppose  that  J7;  V — * V is  a linear  operator,  and  B is  a basis  for  V for  which 

'x\ -X2+X3]  r*r 

[7’(x)]b=  x2  if  [x]B=  x2 

xi-x3  J [x3_ 


Find  [T]b. 


Answer: 


[ T)b  = 


1 

0 

1 


-1  1 

1 0 

0 -1 


18.  Let  X:  V — * V be  a linear  operator.  Prove  that  T is  one-to-one  if  and  only  if  det(^)  * 0. 

19.  (Calculus  required) 

(a)  Show  that  iff  = f (x)is  twice  differentiable,  then  the  function  D C “ ( — oo,  oc J — *■ F ( — oo,  oc  j 
defined  by  ~ (f  ) = / " OO  is  a linear  transformation. 


(b)  Find  a basis  for  the  kernel  of  D. 

(c)  Show  that  the  set  of  functions  satisfying  the  equation  D( f ) = f (x)  is  a two-dimensional  subspace  of 
C'2(  — oo,  oC'V  and  find  a basis  for  this  subspace. 


Answer: 


(b)  /OO  =*.  gOO  = 1 

(c)  f(x)=ex,  g(x)  = e~x 

20.  Let  X p-r  —*  R~'  be  the  function  defined  by  the  formula 


*(-D 

P(  0) 

*(D 


T(p(x))  = 

(a)  Find  T(x2  + 5x  + 6 j. 

(b)  Show  that  T is  a linear  transformation. 

(c)  Show  that  T is  one-to-one. 

(d)  Find?7-1  (o,  3,  0|. 

(e)  Sketch  the  graph  of  the  polynomial  in  part  (d). 

21.  Let  *i , *2,  and  *3  be  distinct  real  numbers  such  that 

xi  <X2<X2 

and  let  X Pj  —*  R~'  be  the  function  defined  by  the  formula 

p Oi) 

T(p(x))  = | p(x2) 

P(x  3) 


(a)  Show  that  T is  a linear  transformation. 

(b)  Show  that  T is  one-to-one. 


(c)  Verify  that  if  a l , &2,  and  123  are  any  real  numbers,  then 


,-l 


\L 


a{ 

« 2 
a 3 


= fllPl(>:)  + <32-^2  (*)  + <33^300 


where 


P.w- 

(*1  “*2)(*1  -X3) 

ftfcO- 

(*2-*l)(*2“*3) 
»_/_> ) = (*-*l)(*~*2) 

' (*3-*l)(*3“*2) 

(d)  What  relationship  exists  between  the  graph  of  the  function 

fll-PlOO  + <32^200  +<33^300 

and  the  points  (x  1 , <at  1 ) , (^2.  <32)>  and  (*3,  ^3)? 


Answer: 


(b)  The  points  are  on  the  graph. 

22.  (Calculus  required)  Let  p(x)  and  q(x)  be  continuous  functions,  and  let  V be  the  subspace  of 
C(  — 00,  + 00)  consisting  of  all  twice  differentiable  functions.  Define  L .V  —*V  by 

L(y(x))=y"(.x)  + p(x)y\x)  +q(x)y(x) 

(a)  Show  that  L is  a linear  transformation. 

(b)  Consider  the  special  case  where  p(x)  = 0 and  q(x)  = 1 . Show  that  the  function 

(j>(x)  =cisinx  + C2C0S  * 
is  in  the  kernel  of  L for  all  real  values  of  c \ and  cj. 

23.  Calculus  required  Let  D.  Pn  —*  Pn  be  the  differentiation  operator  - (P  ) = P . Show  that  the  matrix  for  D 
relative  to  the  basis  -5  = |l,  x,  x* , x ' |.  is 

'0100  ...  0" 

0 0 2 0 ...  0 

0 0 0 3 ...  0 

0 0 0 0 ...  n 

0 0 0 0 ...  0 


24.  Calculus  required  It  can  be  shown  that  for  any  real  number  c,  the  vectors 

1 r (*~c)2  (*-c)” 

1 f X r\  I ? 7 I 

2!  n\ 

form  a basis  for  Pn.  Find  the  matrix  for  the  differentiation  operator  of  Exercise  23  with  respect  to  this 
basis. 

25.  Calculus  required  J Pn  ■ P^  1 be  the  integration  transformation  defined  by 


where  p = «o  + ...  + a„xn-  Find  the  matrix  for  J with  respect  to  the  standard  bases  for  Pn  and 

^M  + l- 


Answer: 


0 0 0 
1 0 0 

1 

2 


0 

0 0 


■k  0 


1 

3 


0 

0 

0 

0 


0 0 0 


« + 1 
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INTRODUCTION 

This  chapter  is  concerned  with  “numerical  methods”  of  linear  algebra,  an  area  of  study 
that  encompasses  techniques  for  solving  large-scale  linear  systems  and  for  finding 
numerical  approximations  of  various  kinds.  It  is  not  our  objective  to  discuss  algorithms 
and  technical  issues  in  fine  detail,  since  there  are  many  excellent  books  on  the  subject. 
Rather,  we  will  be  concerned  with  introducing  some  of  the  basic  ideas  and  exploring 
important  contemporary  applications  that  rely  heavily  on  numerical  ideas — singular  value 
decomposition  and  data  compression.  A computing  utility  such  as  MATLAB, 
Mathematica,  or  Maple  is  recommended  for  Section  9.2  to  Section  9.6  . 


Copyright  © 2010  John  Wiley  & Sons,  Inc.  All  rights  reserved. 


9.1  Ll/-Decompositions 

Up  to  now,  we  have  focused  on  two  methods  for  solving  linear  systems,  Gaussian  elimination  (reduction  to  row 
echelon  form)  and  Gauss-Jordan  elimination  (reduction  to  reduced  row  echelon  form).  While  these  methods  are 
fine  for  the  small-scale  problems  in  this  text,  they  are  not  suitable  for  large-scale  problems  in  which  computer 
roundoff  error,  memory  usage,  and  speed  are  concerns.  In  this  section  we  will  discuss  a method  for  solving  a linear 
system  of  n equations  in  n unknowns  that  is  based  on  factoring  its  coefficient  matrix  into  a product  of  lower  and 
upper  triangular  matrices.  This  method,  called  “Z  [/-decomposition, ” is  the  basis  for  many  computer  algorithms  in 
common  use. 


Solving  Linear  Systems  by  Factoring 

Our  first  goal  in  this  section  is  to  show  how  to  solve  a linear  system  Ax  = b °f  « equations  in  n unknowns  by 
factoring  the  coefficient  matrix  A into  a product 


A = LU  (1) 

where  L is  lower  triangular  and  U is  upper  triangular.  Once  we  understand  how  to  do  this,  we  will  discuss  how  to 
obtain  the  factorization  itself. 

Assuming  that  we  have  somehow  obtained  the  factorization  in  1,  the  linear  system  Ax  = b can  be  solved  by  the 
following  procedure,  called  LU-decomposition. 


The  Method  of  /.(/-Decomposition 

Step  1.  Rewrite  the  system  Ax  = b as 

LUx  = h 

Step  2.  Define  a new  « x 1 matrix  v by 


Ux  = y 


Step  3.  Use  3 to  rewrite  2 as  Ly  = b and  solve  this  system  for  y. 
Step  4.  Substitute  y in  3 and  solve  for  x. 


(2) 


(3) 


This  procedure,  which  is  illustrated  in  Figure  9.1.1,  replaces  the  single  linear  system  Ax  = b by  a pair  of  linear 
systems 

Ux  = y 

L y = b 

that  must  be  solved  in  succession.  However,  since  each  of  these  systems  has  a triangular  coefficient  matrix,  it 
generally  turns  out  to  involve  no  more  computation  to  solve  the  two  systems  than  to  solve  the  original  system 


directly. 


Solve  Ax  = b 


x 


Solve  VS 


Figure  9.1.1 


h 


EXAMPLE  1 Solving  Ax  = b by  ^-Decomposition 


Later  in  this  section  we  will  derive  the  factorization 


2 6 2' 

2 0 O' 

"1  3 f 

O 

00 

1 

on 

1 

= 

-3  1 0 

0 1 3 

4 9 2 

4-3  7 

0 0 1 

A 

— 

L 

U 

Use  this  result  to  solve  the  linear  system 


2 6 2' 

"*f 

2 

O 

00 

1 

on 

1 

= 

x2 

2 

4 9 2 

x3 

3 

A 

— 

X 

b 

From  4 we  can  rewrite  this  system  as 


2 0 O' 

"1  3 f 

‘*f 

'2' 

-3  1 0 

0 1 3 

x2 

= 

2 

4-3  7 

0 0 1 

x3 

3 

L U x = b 


(4) 


(5) 


In  1979  an  important  library  of  machine-independent  linear  algebra 
programs  called  LINPACK  was  developed  at  Argonne  National  Laboratories.  Many  of  the 
programs  in  that  library  use  the  decomposition  methods  that  we  will  study  in  this  section. 
Variations  of  the  LINPACK  routines  are  used  in  many  computer  programs,  including 
MATLAB,  Mathematica,  and  Maple. 


As  specified  in  Step  2 above,  let  us  define  yi,  yj,  and  y 3 by  the  equation 


1 3 1 

■*r 

>1 

0 1 3 

*2 

= 

y 2 

1 

O 

O 

*3_ 

y 3 

U 

X 

= 

y 

(6) 


which  allows  us  to  rewrite  5 as 


(V) 


2 0 O' 

71 

'2' 

-3  1 0 

72 

= 

2 

4-3  7 

73 

3 

L y = b 


or  equivalently  as 

2yi  =2 

-371+  72  =2 

4yi  - 372  + 773  = 3 

This  system  can  be  solved  by  a procedure  that  is  similar  to  back  substitution,  except  that  we  solve  the 
equations  from  the  top  down  instead  of  from  the  bottom  up.  This  procedure,  called  forward 
substitution , yields 

y i = l,  72  = 5,  73  = 2 

(verify).  As  indicated  in  Step  4 above,  we  substitute  these  values  into  6,  which  yields  the  linear 
system 


'1  3 f 

‘*f 

V 

0 1 3 

x2 

= 

5 

O 

O 

i 

*3 

2 

or,  equivalently, 

xi  + 3x2  + *3=1 
X2  + 3x3  = 5 

X3  = 2 

Solving  this  system  by  back  substitution  yields 

x\  =2,  X2  = - 1,  *3  = 2 

(verify). 


Alan  Mathison  Turing  (1912-1954) 

Although  the  ideas  were  known  earlier,  credit  for  popularizing  the  matrix 
formulation  of  the  Z£/-decomposition  is  often  given  to  the  British  mathematician  Alan 
Turing  for  his  work  on  the  subject  in  1948.  Turing,  one  of  the  great  geniuses  of  the  twentieth 
century,  is  the  founder  of  the  field  of  artificial  intelligence.  Among  his  many 
accomplishments  in  that  field,  he  developed  the  concept  of  an  internally  programmed 
computer  before  the  practical  technology  had  reached  the  point  where  the  construction  of 


such  a machine  was  possible.  During  World  War  II  Turing  was  secretly  recruited  by  the 
British  government's  Code  and  Cypher  School  at  Bletchley  Park  to  help  break  the  Nazi 
Enigma  codes;  it  was  Turing's  statistical  approach  that  provided  the  breakthrough.  In  addition 
to  being  a brilliant  mathematician,  Turing  was  a world-class  runner  who  competed 
successfully  with  Olympic-level  competition.  Sadly,  Turing,  a homosexual,  was  tried  and 
convicted  of  “gross  indecency”  in  1952,  in  violation  of  the  then-existing  British  statutes. 
Depressed,  he  committed  suicide  at  age  41  by  eating  an  apple  laced  with  cyanide. 

[Image:  Time  & Life  Pictures/Getty  Images,  Inc.] 


Finding  LU -Decompositions 

Example  1 makes  it  clear  that  after  A is  factored  into  lower  and  upper  triangular  matrices,  the  system  Ax  = b can 
be  solved  by  one  forward  substitution  and  one  back  substitution.  We  will  now  show  how  to  obtain  such 
factorizations.  We  begin  with  some  terminology. 


DEFINITION  1 

A factorization  of  a square  matrix  A as  A = L £/,  where  L is  lower  triangular  and  U is  upper  triangular  is 
called  an  LU-decomposition  (or  L U factorization ) of  A. 


J 


Not  every  square  matrix  has  an  L [/-decomposition.  However,  we  will  see  that  if  it  is  possible  to  reduce  a square 
matrix  A to  row  echelon  form  by  Gaussian  elimination  without  performing  any  row  interchanges , then  A will  have 
an  L ^-decomposition,  though  it  may  not  be  unique.  To  see  why  this  is  so,  assume  that  A has  been  reduced  to  a row 
echelon  form  U using  a sequence  of  row  operations  that  does  not  include  row  interchanges.  We  know  from 
Theorem  1.5.1  that  these  operations  can  be  accomplished  by  multiplying  A on  the  left  by  an  appropriate  sequence 
of  elementary  matrices;  that  is,  there  exist  elementary  matrices  E\,  E2,  such  that 


' S2E1A  = U 


(8) 


Since  elementary  matrices  are  invertible,  we  can  solve  8 for  A as 

■ ■ -B?U 

or  more  briefly  as 


A = LU 


(9) 


where 


(10) 


We  now  have  all  of  the  ingredients  to  prove  the  following  result. 


THEOREM  9.1.1 

If  A is  a square  matrix  that  can  be  reduced  to  a row  echelon  form  U by  Gaussian  elimination  without  row 
interchanges,  then  A can  be  factored  as  A = LU>  where  L is  a lower  triangular  matrix. 


Let  L and  £/be  the  matrices  in  Formulas  10  and  8,  respectively.  The  matrix  U is  upper  triangular  because  it 
is  a row  echelon  form  of  a square  matrix  (so  all  entries  below  its  main  diagonal  are  zero).  To  prove  that  L is  lower 
triangular,  it  suffices  to  prove  that  each  factor  on  the  right  side  of  10  is  lower  triangular,  since  Theorem  1.7.16  will 
then  imply  that  L itself  is  lower  triangular.  Since  row  interchanges  are  excluded,  each  Ej  results  either  by  adding  a 
scalar  multiple  of  one  row  of  an  identity  matrix  to  a row  below  or  by  multiplying  one  row  of  an  identity  matrix  by  a 
nonzero  scalar.  In  either  case,  the  resulting  matrix  Ej  is  lower  triangular  and  hence  so  is  E~^  by  Theorem  1.7. Id. 

This  completes  the  proof. 


EXAMPLE  2 An  ^-Decomposition 


Find  an  Z£/-decomposition  of 


A = 


2 6 2 
-3  -8  0 
4 9 2 


To  obtain  an  ZL-decomposition,  A = LU->  we  will  reduce  A to  a row  echelon  form  U using  Gaus! 
elimination  and  then  calculate  L from  10.  The  steps  are  as  follows: 


Elemental1)  Matrix 

Rcductiun  tu  Currcspundinu  to  Imcrse  of  the 

Run  Echelun  Fur  hi  Run  Operatiun  the  Run  Operatiun  Elementarx  Matrix 


2 6 2 

-3  -8  0 

4 9 2. 


^ x row  1 

1 

2 

0 

o" 

~2 

0 

0" 

Step  1 

E\  - 

0 

i 

0 

E i*  = 

0 

1 

0 

_0 

0 

1. 

0 

0 

lj 

i 

3 

f 

-3 

-8 

0 

4 

9 

2 

"1 

0 

0 

1 

0 0 

Step  2 

(3  x row  1)  + row  2 

E 2 = 

3 

1 

0 

E ;l  = 

-3 

1 0 

0 

0 

1 

0 

0 1 

1 3 1 
0 1 3 
4 9 2 


Step  3 

(-4  x row  1)  + row  3 E3  = 

t 0 O' 
0 1 0 

Eyl  = 

"1  0 O' 

0 1 0 

1 

l 

0 

1  

4 0 1 

"i 

3 

r 

0 

1 

3 

.0 

-3 

_2 

"i 

0 

0 

'1 

0 

Step  4 

(3  x row  2)  + row  3 

li 

0 

1 

0 

£4"'  = 

0 

1 

_0 

3 

1. 

.0 

-3 

"l 

3 

r 

0 

1 

3 

.0 

0 

7. 

"1 

0 

0" 

"1 

o 

o 

Step  5 

j x row  3 

e5  = 

0 

1 

0 

e;1  = 

0 

1 0 

0 

0 

I 

7 _ 

0 

o 7. 

"l 

3 

r 

0 

1 

3 

= u 

.0 

0 

i_ 

and,  from  10, 


"2 

0 

O' 

1 

0 

O' 

"1 

0 

o' 

"1 

0 

O' 

"1 

0 

o' 

L = 

0 

1 

0 

-3 

1 

0 

0 

1 

0 

0 

1 

0 

0 

1 

0 

0 

0 

1 

0 

0 

1 

4 

0 

1 

0 

-3 

1 

0 

0 

7 

2 0 0 
-3  1 0 

4-3  7 


so 


is  an  L i -decomposition  of  A. 


2 6 2" 

2 0 O' 

1 

oo 
1 

-3  -8  0 

= 

-3  1 0 

0 1 3 

4 9 2 

4-3  7 

o 

o 

1 

Bookkeeping 

As  Example  2 shows,  most  of  the  work  in  constructing  an  L [/-decomposition  is  expended  in  calculating  L. 
However,  all  this  work  can  be  eliminated  by  some  careful  bookkeeping  of  the  operations  used  to  reduce  A to  U. 


Because  we  are  assuming  that  no  row  interchanges  are  required  to  reduce  A to  U,  there  are  only  two  types  of 
operations  involved — multiplying  a row  by  a nonzero  constant,  and  adding  a scalar  multiple  of  one  row  to  another. 
The  first  operation  is  used  to  introduce  the  leading  l's  and  the  second  to  introduce  zeros  below  the  leading  l's. 


In  Example  2,  a multiplier  of  was  needed  in  Step  1 to  introduce  a leading  1 in  the  first  row,  and  a multiplier  of  y 


was  needed  in  Step  5 to  introduce  a leading  1 in  the  third  row.  No  actual  multiplier  was  required  to  introduce  a 
leading  1 in  the  second  row  because  it  was  already  a 1 at  the  end  of  Step  2,  but  for  convenience  let  us  say  that  the 
multiplier  was  1 . Comparing  these  multipliers  with  the  successive  diagonal  entries  of  L,  we  see  that  these  diagonal 
entries  are  precisely  the  reciprocals  of  the  multipliers  used  to  construct  U: 


L = 


2 0 0 
-3  1 0 

4-3  7 


(11) 


Also  observe  in  Example  2 that  to  introduce  zeros  below  the  leading  1 in  the  first  row,  we  used  the  operations 

add  3 times  the  first  row  to  the  second 
add— 4 times  the  first  row  to  the  third 


and  to  introduce  the  zero  below  the  leading  1 in  the  second  row,  we  used  the  operation 

add  3 times  the  second  row  to  the  third 

Now  note  in  12  that  in  each  position  below  the  main  diagonal  of  L,  the  entry  is  the  negative  of  the  multiplier  in  the 
operation  that  introduced  the  zero  in  that  position  in  U: 


L = 


2 

-3 

4 


0 0 
1 0 
-3  7 


(12) 


This  suggests  the  following  procedure  for  constructing  anL£/-decomposition  of  a square  matrix^,  assuming  that 
this  matrix  can  be  reduced  to  row  echelon  form  without  row  interchanges. 


Procedure  for  Constructing  an  /.(/-Decomposition 

Step  1.  Reduce  A to  a row  echelon  form  U by  Gaussian  elimination  without  row  interchanges,  keeping 
track  of  the  multipliers  used  to  introduce  the  leading  l's  and  the  multipliers  used  to  introduce  the 
zeros  below  the  leading  l's. 

Step  2.  In  each  position  along  the  main  diagonal  of  Z,  place  the  reciprocal  of  the  multiplier  that  introduced 
the  leading  1 in  that  position  in  U. 

Step  3.  In  each  position  below  the  main  diagonal  of  Z,  place  the  negative  of  the  multiplier  used  to 
introduce  the  zero  in  that  position  in  U. 

Step  4.  Form  the  decomposition  A = LU- 


EXAMPLE  3 Constructing  an  /.^-Decomposition 


Find  an  /.(  -decomposition  of 


-2  0 
-1  1 
7 5 


We  will  reduce  A to  a row  echelon  form  U and  at  each  step  we  will  fill  in  an  entry  of  L in 
accordance  with  the  four-step  procedure  above. 
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Thus,  we  have  constructed  the  Z£/-decomposition 

O 

o 

1 - 

1 

3 

0 

A = LU  = 

9 2 0 
3 8 1 

0 

1 

1 

2 

0 

0 

1 

• denotes  an  unknown 
entry  of  f.. 


No  actual  operation  is 
performed  here  since 
there  ts  already  a leading 
I in  the  third  row. 


We  leave  it  for  you  to  confirm  this  end  result  by  multiplying  the  factors. 


LU-Decompositions  Are  Not  Unique 

In  the  absence  of  restrictions,  Z£/-decompositions  are  not  unique.  For  example,  if 
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and  L has  nonzero  diagonal  entries,  then  we  can  shift  the  diagonal  entries  from  the  left  factor  to  the  right  factor  by 
writing 
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which  is  another  Z ^-decomposition  of  A. 
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LDU-Decompositions 


The  method  we  have  described  for  computing  /^-decompositions  may  result  in  an  “asymmetry”  in  that  the  matrix 
U has  l’s  on  the  main  diagonal  but  L need  not.  However,  if  it  is  preferred  to  have  Ts  on  the  main  diagonal  of  the 
lower  triangular  factor,  then  we  can  “shift”  the  diagonal  entries  of  L to  a diagonal  matrix  D and  write  L as 

L = L'D 


where  V is  a lower  triangular  matrix  with  l’s  on  the  main  diagonal.  For  example,  a general  3x3  lower  triangular 
matrix  with  nonzero  entries  on  the  main  diagonal  can  be  factored  as 
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L L'  D 


Note  that  the  columns  of  L ' are  obtained  by  dividing  each  entry  in  the  corresponding  column  of  L by  the  diagonal 
entry  in  the  column.  Thus,  for  example,  we  can  rewrite  4 as 
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One  can  prove  that  if  A is  a square  matrix  that  can  be  reduced  to  row  echelon  form  without  row  interchanges,  then 
A can  be  factored  uniquely  as 

A = LDU 

where  L is  a lower  triangular  matrix  with  Ts  on  the  main  diagonal,  D is  a diagonal  matrix,  and  U is  an  upper 
triangular  matrix  with  Ts  on  the  main  diagonal.  This  is  called  the  LDU-decomposition  (or  LDU -factorization)  of 
A. 


PL  U-Decompositions 


Many  computer  algorithms  for  solving  linear  systems  perform  row  interchanges  to  reduce  roundoff  error,  in  which 


case  the  existence  of  an  L //-decomposition  is  not  guaranteed.  However,  it  is  possible  to  work  around  this  problem 
by  “preprocessing”  the  coefficient  matrix  A so  that  the  row  interchanges  are  performed  prior  to  computing  the 
Z //-decomposition  itself  More  specifically,  the  idea  is  to  create  a matrix  Q (called  a permutation  matrix ) by 
multiplying,  in  sequence,  those  elementary  matrices  that  produce  the  row  interchanges  and  then  execute  them  by 
computing  the  product  QA.  This  product  can  then  be  reduced  to  row  echelon  form  without  row  interchanges,  so  it  is 
assured  to  have  an  Z //-decomposition 


QA  = LU  (13) 

Because  the  matrix  Q is  invertible  (being  a product  of  elementary  matrices),  the  systems  Ax  = b and  Q^x  — £?b 
will  have  the  same  solutions.  But  it  follows  from  13  that  the  latter  system  can  be  rewritten  as  LUx  = Qh  and  hence 
can  be  solved  using  Z //-decomposition. 

It  is  common  to  see  Equation  13  expressed  as 


A = PLU 

in  which  P = Q . This  is  called  a PLU-decomposition  or  ( PLU-factorization ) of  A. 


(14) 


Concept  Review 

Z //-decomposition 
LD  //-decomposition 
PL  //-decomposition 

Skills 

Determine  whether  a square  matrix  has  an  L //-decomposition. 
Find  an  L //-decomposition  of  a square  matrix. 

Use  the  method  of  L //-decomposition  to  solve  linear  systems. 
Find  the  LD  //-decomposition  of  a square  matrix. 

Find  a PL  //-decomposition  of  a square  matrix. 


Exercise  Set  9.1 


1.  Use  the  method  of  Example  1 and  the  L ( -decomposition 
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OJ 

1 
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1 

to  solve  the  system 

3xi  — 6x2  = 0 

-2xi + 5x2  = 1 


Answer: 


*1  =2,  x2  = 1 

2.  Use  the  method  of  Example  1 and  the  LU-decomposition 
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1 

to  solve  the  system 

3xi  “ 6x2  — 3x3  = — 3 
2xi  + 6x3  = — 22 

— 4xi  +7x2  + 4x3  = 3 

In  Exercises  3-10,  find  an  LU-decomposition  of  the  coefficient  matrix,  and  then  use  the  method  of  Example  1 to 
solve  the  system. 


3. 

2 

8' 

Of 

~-2 

-1 

-1_ 

x2_ 

_— 2_ 

Answer: 


*1  = 3,  X2  = - 1 
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Answer: 


*1  = - 1,  *2=  1.  *3  = 0 


—3  12  -6' 

Of 

1 -2  2 

x2 

= 

0 1 1 

x3 

0 
WO 

WO 

1  

Of 

r 

-8  -7  -9 

*2 

= 

1 

O 

ro 

CO 

*3 

Answer: 


XI  = - 1,  X2  = 1,  *3  = 0 
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Answer: 


*1=  -3,  X 2=1 , *3  = 2,  *4=1 
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(a)  F ind  an  L ^/-decomposition  of  A. 

(b)  Express  A in  the  form  A = L \DU\ , where  L \ is  lower  triangular  with  1 's  along  the  main  diagonal,  U \ is 
upper  triangular,  and  D is  a diagonal  matrix. 

(c)  Express  A in  the  form  A = L2U2,  where  £3  is  lower  triangular  with  l's  along  the  main  diagonal  and  U 2 is 
upper  triangular. 


Answer: 


(a) 


A = LU  = 


2 0 0 

-2  1 0 

2 0 1 


0 0 1 
0 0 1 


(b) 

'10  0' 

'2  0 0] 

A = L1DU1  = 

-1  1 0 

1 0 1 

0 1 0 
0 0 lj 

(c) 

10  0“ 

'2  1 -r 

a = l2u2  = 

-1  1 0 

0 0 1 

1 0 1 

0 0 1 

1 

2 
1 
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In  Exercises  12-13,  find  an  £.D  [/-decomposition  of  A 
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Answer: 
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0 1 0 

2 -2  1 

0 0 1 

0 0 1 

14. 


(a)  Show  that  the  matrix 


0 1 
1 0_ 

has  no  Ltd-decomposition. 

(b)  Find  a PL  (-decomposition  of  this  matrix. 

In  Exercises  1 5-1 6,  use  the  given  PI -l  -decomposition  of  A to  solve  the  linear  system  Ax  = b by  rewriting  it  as 
p-1Ax  = P-1b  and  solving  this  system  by  Ltd-decomposition. 
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= PLU 


Answer: 

21 


14 
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* 1 — 17  ’ *2  — ~ 17  ’ *3  — 17 
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; A = 


4 1 2 
0 2 1 
8 1 8 
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= PLU 


In  Exercises  17-18,  find  a PL  Id-decomposition  of  A,  and  use  it  to  solve  the  linear  system  Ax  = b by  the  method 
of  Exercises  15  and  16. 
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*1=  *2  = ^,  *3  = 3 


18. 
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19.  Let 


(a)  Prove:  If  a ^ 0?  then  the  matrix  A has  a unique  Z //-decomposition  with  l’s  along  the  main  diagonal  of  Z. 

(b)  Find  the  Z //-decomposition  described  in  part  (a). 

Answer: 


a b 

'1  o' 

a b 

c d 

— 

S.  1 

I 

a 

o 

a 

a 

20.  Let  Ax  = b he  a linear  system  of  n equations  in  n unknowns,  and  assume  that  A is  an  invertible  matrix  that  can 
be  reduced  to  row-echelon  form  without  row  interchanges.  How  many  additions  and  multiplications  are 
required  to  solve  the  system  by  the  method  of  Example  1? 

21.  Prove:  If  A is  any  n x n matrix,  then  A can  be  factored  as  A = PL  U,  where  Z is  lower  triangular,  U is  upper 
triangular,  and  P can  be  obtained  by  interchanging  the  rows  of  ln  appropriately.  [Hint:  Let  U be  a row  echelon 
form  of  A,  and  let  all  row  interchanges  required  in  the  reduction  of  A to  C/be  performed  first.] 

True-False  Exercises 

In  parts  (a)-(e)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer. 

(a)  Every  square  matrix  has  an  L //-decomposition. 

Answer: 

False 

(b)  If  a square  matrix  A is  row  equivalent  to  an  upper  triangular  matrix  (7,  then  A has  an  L //-decomposition. 
Answer: 

False 

(c)  If  L i , L2,  - - L fc  are  n x n lower  triangular  matrices,  then  the  product  L\L2m  ■ • Zft  is  lower  triangular. 


Answer: 

True 

(d)  If  a square  matrix  A has  an  L //-decomposition,  then  A has  a unique  LD  //-decomposition. 
Answer: 

True 

(e)  Every  square  matrix  has  a PL  //-decomposition. 

Answer: 

True 


Copyright  © 2010  John  Wiley  & Sons,  Inc.  All  rights  reserved. 


9.2  The  Power  Method 

The  eigenvalues  of  a square  matrix  can,  in  theory,  be  found  by  solving  the  characteristic  equation.  However,  this 
procedure  has  so  many  computational  difficulties  that  it  is  almost  never  used  in  applications.  In  this  section  we  will 
discuss  an  algorithm  that  can  be  used  to  approximate  the  eigenvalue  with  greatest  absolute  value  and  a corresponding 
eigenvector.  This  particular  eigenvalue  and  its  corresponding  eigenvectors  are  important  because  they  arise  naturally  in 
many  iterative  processes.  The  methods  we  will  study  in  this  section  have  recently  been  used  to  create  Internet  search 
engines  such  as  Google.  We  will  discuss  this  application  in  the  next  section. 


The  Power  Method 

There  are  many  applications  in  which  some  vector  xq  in  Rn  is  multiplied  repeatedly  by  an  n x n matrix  A to  produce  a 
sequence 

xo,  ^xo>  -^2X0 j4Axo,  ... 

We  call  a sequence  of  this  form  a power  sequence  generated  by  A.  In  this  section  we  will  be  concerned  with  the 
convergence  of  power  sequences  and  how  such  sequences  can  be  used  to  approximate  eigenvalues  and  eigenvectors. 
For  this  purpose,  we  make  the  following  definition. 


DEFINITION  1 

If  the  distinct  eigenvalues  of  a matrix  A are  Ai , A2, . . Afc,  and  if  |Ai  | is  larger  than  |A2 1, . . |Afc  |,  then  Aj  is 
called  a dominant  eigenvalue  of  A.  Any  eigenvector  corresponding  to  a dominant  eigenvalue  is  called  a 
dominant  eigenvector  of  A. 


EXAMPLE  1 Dominant  Eigenvalues 

Some  matrices  have  dominant  eigenvalues  and  some  do  not.  For  example,  if  the  distinct  eigenvalues  of  a 
matrix  are 

Ai  = — 4,  A2  = — 2,  A3  = 1,  A4  = 3 

then  Ai  = — 4 is  dominant  since  \X\  | = 4 is  greater  than  the  absolute  values  of  all  the  other  eigenvalues; 
but  if  the  distinct  eigenvalues  of  a matrix  are 

A!  =7,  A2=  —7,  A3  = - 2,  A4  = 5 

then  | \\  | = | A2 1 =7,  so  there  is  no  eigenvalue  whose  absolute  value  is  greater  than  the  absolute  value  of 
all  the  other  eigenvalues. 


The  most  important  theorems  about  convergence  of  power  sequences  apply  to  n x n matrices  with  n linearly 
independent  eigenvectors  (symmetric  matrices,  for  example),  so  we  will  limit  our  discussion  to  this  case  in  this  section. 


THEOREM  9.2.1 


* VI 

Let  A be  a symmetric  « x « matrix  with  a positive  dominant  eigenvalue  A If  xq  is  a unit  vector  in  Rn  that  is 
not  orthogonal  to  the  eigenspace  corresponding  to  A,  then  the  normalized  power  sequence 


xn  X1  _ x.  _ A*  1 x,  _ Ack-1 

°’  1 \\axq\\’  X2  wmw * iMxft-iir 

converges  to  a unit  dominant  eigenvector,  and  the  sequence 

,4xi -xi,  ^x2  " x2»  ^4x3  ■ X3, ^4x*-x*, ... 


(1) 


(2) 


converges  to  the  dominant  eigenvalue  A. 


In  the  exercises  we  will  ask  you  to  show  that  1 can  also  be  expressed  as 


xq,  xi 


Ax\ 


£L 


IM*oll  ’ 


*2  = 


' ll^2x0|| 


\\AkxQ\\ 


(3) 


This  form  of  the  power  sequence  expresses  each  iterate  in  terms  of  the  starting  vector  xo,  rather  than  in  terms  of  its 
predecessor. 


We  will  not  prove  Theorem  9.2.1,  but  we  can  make  it  plausible  geometrically  in  the  2 x 2 case  where  A is  a symmetric 
matrix  with  distinct  positive  eigenvalues,  Ai  and  A2,  one  of  which  is  dominant.  To  be  specific,  assume  that  Ai  is 
dominant  and 

Ai  > A2  > 0 

Since  we  are  assuming  that  A is  symmetric  and  has  distinct  eigenvalues,  it  follows  from  Theorem  7.2.2  that  the 
eigenspaces  corresponding  to  Ai  and  A2  are  perpendicular  lines  through  the  origin.  Thus,  the  assumption  that  xq  is  a 
unit  vector  that  is  not  orthogonal  to  the  eigenspace  corresponding  to  X\  implies  that  xq  does  not  lie  in  the  eigenspace 
corresponding  to  A2  To  see  the  geometric  effect  of  multiplying  xq  by  A,  it  will  be  useful  to  split  xq  into  the  sum 

x0  = v0  T WQ  (4) 

where  vq  and  wq  are  the  orthogonal  projections  of  xq  on  the  eigenspaces  of  Aj  and  A2,  respectively  (Figure  9.2.1a). 

Aiv0  + A>w0 


This  enables  us  to  express  Ax. q as 


^4x0  = Av  0 4 ^4wq  = Ai  vq  4 A2WQ 


(5) 


which  tells  us  that  multiplying  xq  by  A “scales”  the  terms  vq  and  wq  in  4 by  X\  and  A2,  respectively.  However,  is 
larger  than  A2,  so  the  scaling  is  greater  in  the  direction  of  vq  than  in  the  direction  of  wq  Thus,  multiplying  xq  by  A 
“pulls”  xq  toward  the  eigenspace  of  Ai , and  normalizing  produces  a vector  xi  = ^xq  / | |^1xq  II  > which  is  on  the  unit 
circle  and  is  closer  to  the  eigenspace  of  X\  than  xq  (Figure  9.2.  lb).  Similarly,  multiplying  xi  by  A and  normalizing 
produces  a unit  vector  X2  that  is  closer  to  the  eigenspace  of  X\  than  x^ . Thus,  it  seems  reasonable  that  by  repeatedly 
multiplying  by  A and  normalizing  we  will  produce  a sequence  of  vectors  x^  that  lie  on  the  unit  circle  and  converge  to  a 
unit  vector  x in  the  eigenspace  of  X\  (Figure  9.2.1c).  Moreover,  if  x^  converges  to  x,  then  it  also  seems  reasonable  that 
Ax.fr  • x.fr  will  converge  to 

Ax  ■ x = Ajx  • x = Ai||x||2  = X\ 

which  is  the  dominant  eigenvalue  of  A. 


The  Power  Method  with  Euclidean  Scaling 

Theorem  9.2.1  provides  us  with  an  algorithm  for  approximating  the  dominant  eigenvalue  and  a corresponding  unit 
eigenvector  of  a symmetric  matrix  A,  provided  the  dominant  eigenvalue  is  positive.  This  algorithm,  called  the  power 
method  with  Euclidean  scaling , is  as  follows: 

r n 


The  Power  Method  with  Euclidean  Scaling 

Step  1.  Choose  an  arbitrary  nonzero  vector  and  normalize  it,  if  need  be,  to  obtain  a unit  vector  xq  . 

Step  2.  Compute  Ax q and  normalize  it  to  obtain  the  first  approximation  xi  to  a dominant  unit  eigenvector. 
Compute  Ax\  • xj  to  obtain  the  first  approximation  to  the  dominant  eigenvalue. 

Step  3.  Compute  Ax\  and  normalize  it  to  obtain  the  second  approximation  X2  to  a dominant  unit  eigenvector. 
Compute  Ax 2 ■ X2  to  obtain  the  second  approximation  to  the  dominant  eigenvalue. 

Step  4.  Compute  Ax 2 and  normalize  it  to  obtain  the  third  approximation  X3  to  a dominant  unit  eigenvector. 
Compute  Ax^  • X3  to  obtain  the  third  approximation  to  the  dominant  eigenvalue. 

Continuing  in  this  way  will  usually  generate  a sequence  of  better  and  better  approximations  to  the  dominant 

* 

eigenvalue  and  a corresponding  unit  eigenvector. 


J 


EXAMPLE  2 The  Power  Method  with  Euclidean  Scaling 


Apply  the  power  method  with  Euclidean  scaling  to 

r 

0_ 

Stop  at  x_5  and  compare  the  resulting  approximations  to  the  exact  values  of  the  dominant  eigenvalue  and 
eigenvector. 


A = 


3 2 
2 3 


with  xq  = 


We  will  leave  it  for  you  to  show  that  the  eigenvalues  of  A are  A = 1 and  A = 5 and  that  the 
eigenspace  corresponding  to  the  dominant  eigenvalue  A = 5 is  the  line  represented  by  the  parametric 
equations  x \ = t,x 2 = C which  we  can  write  in  vector  form  as 

1" 

1 


x = t 


(6) 


Setting  t = 1 / ^2  yields  the  normalized  dominant  eigenvector 

1 


vi  = 


1 

f2 


0.707106781187... 

0.707106781187... 


(7) 


Now  let  us  see  what  happens  when  we  use  the  power  method,  starting  with  the  unit  vector  xq. 
Akq  = 


-Axj  t 
Ax  2 1 
Ax.  3 i 
Ax  4 ! 


3 2' 

V 

'3' 

v,  - ^0 

1 

‘3' 

1 

"3' 

"0.83205" 

2 3. 

_0_ 

2_ 

1 \\Axq\\ 

2 _ 

~ 3.60555 

_2_ 

_0.55470_ 

'3 

2' 

'0.83205' 

"3.60555" 

1 

"3.60555" 

'0.73480' 

2 

3_ 

_0.55470_ 

£3 

3.32820 

2 IIAill 

~ 4.90682 

3.32820 

_0.67828_ 

'3 

2' 

'0.73480' 

'3.56097' 

Ta_  Ax  2 

1 

'3.56097' 

'0.71274' 

2 

3_ 

_0.67828_ 

_3.50445_ 

3 11^211 

~ 4.99616 

_ 3. 50445  _ 

_0.70143_ 

'3 

2' 

'0.71274' 

"3.54108" 

v,  _ Ax 3 

1 

"3.54108" 

'0.70824' 

2 

3_ 

_0.70143_ 

_3.52976_ 

4 IIAsll 

~ 4.99985 

_3.52976_ 

_0.70597_ 

'3 

2' 

'0.70824' 

"3.53666" 

...  Ax.4 

1 

"3.53666" 

'0.70733' 

2 

3_ 

_0.70597_ 

_3.53440_ 

5 WAxaW 

~ 4.99999 

_ 3. 53440  _ 

_0.70688_ 

A(1>  = J ■ xj  = (Ax{ ) rxj  « [ 3. 60555  3. 32820  ] 

A®  = ^2J'X2  = (^2)rx2«  [3.56097  3.50445] 
A®  = ^4x3  j • x3  = (^x3)  rx3  « [ 3. 54 1 08  3. 52976  ] 
A^  = fjfcJ  • x4  = (i4i4)  rx4  « [ 3. 53666  3. 53440  ] 
\®=  \Ax5 | -x5  = (Ax5)Tx5 » [3.53576  3.53531] 


0.83205 

0.55470 

0.73480 

0.67828 

0.71274 

0.70143 

0.70824 

0.70597 

0.70733 

0.70688 


: 4.84615 


: 4. 9936 1 


: 4.99974 


; 4. 99999 


: 5.00000 


Thus,  approximates  the  dominant  eigenvalue  to  five  decimal  place  accuracy  and  x^  approximates  the 
dominant  eigenvector  in  7 correctly  to  three  decimal  place  accuracy. 


It  is  accidental  that  \(?)  (the  fifth  approximation) 
produced  five  decimal  place  accuracy.  In  general,  n 
iterations  need  not  produce  n decimal  place 
accuracy. 


The  Power  Method  with  Maximum  Entry  Scaling 

There  is  a variation  of  the  power  method  in  which  the  iterates,  rather  than  being  normalized  at  each  stage,  are  scaled  to 
make  the  maximum  entry  1 . To  describe  this  method,  it  will  be  convenient  to  denote  the  maximum  absolute  value  of  the 
entries  in  a vector  x by  max(x) . Thus,  for  example,  if 


X = 


5 
3 

-7 

2 

then  max(x)  = 7.  We  will  need  the  following  variation  of  Theorem  9.2.1. 


THEOREM  9.2.2 


Let  A be  a symmetric  nxn  matrix  with  a positive  dominant*  eigenvalue  \ If  xq  is  a nonzero  vector  in  Rn  that 
is  not  orthogonal  to  the  eigenspace  corresponding  to  A,  then  the  sequence 


XQ, 


xi  = ^0 

max(^xo) 


*2  = 


Ax  i 

max(Axi) 


A*-i 

max(^x*_i) 


(8) 


converges  to  an  eigenvector  corresponding  to  X,  and  the  sequence 


Ax i • xi  Axj  ■ X2  Ax^  ■ X3 

xi  -xi  ’ X2-X2  ’ X3  • X3  ’ ’ 


Xfc-Xfc  ’• 


(9) 


converges  to  X. 


In  the  exercises  we  will  ask  you  to  show  that  8 can  be  written  in  the  alternative  form 

Xl  = 4zsi x,= Ik  = dha 

1 “^^o)  ’ 2 m(A||] 

which  expresses  the  iterates  in  terms  of  the  initial  vector  xq 


(10) 


We  will  omit  the  proof  of  this  theorem,  but  if  we  accept  that  8 converges  to  an  eigenvector  of  A,  then  it  is  not  hard  to  see 
why  9 converges  to  the  dominant  eigenvalue.  For  this  purpose  we  note  that  each  term  in  9 is  of  the  form 


Ax  ■ x 

X ■ X 


(11) 


which  is  called  a Rayleigh  quotient  of  A.  In  the  case  where  X is  an  eigenvalue  of  A and  x is  a corresponding  eigenvector, 
the  Rayleigh  quotient  is 


.nX  * X 
X ■ X 


Thus,  if  Xfc  converges  to  a dominant  eigenvector 


Ax-x  _ A(x-x) 

X • X X • X 

then  it  seems  reasonable  that 


Xfc  -Xft 


converges  to 


Ahjl=x 

X • X 


which  is  the  dominant  eigenvalue. 


Theorem  9.2.2  produces  the  following  algorithm,  called  the  power  method  with  maximum  entry  scaling. 

r n 


The  Power  Method  with  Maximum  Entry  Scaling 


Step  1.  Choose  an  arbitrary  nonzero  vector  xq 

Step  2.  Compute  Axq  and  multiply  it  by  the  factor  1 / max(j4xo)  to  obtain  the  first  approximation  x\  to  a 
dominant  eigenvector.  Compute  the  Rayleigh  quotient  of  x\  to  obtain  the  first  approximation  to  the 
dominant  eigenvalue. 

Step  3.  Compute  Ax\  and  scale  it  by  the  factor  1 / max(^xi)  to  obtain  the  second  approximation  X2  to  a 
dominant  eigenvector.  Compute  the  Rayleigh  quotient  of  X2  to  obtain  the  second  approximation  to  the 
dominant  eigenvalue. 

Step  4.  Compute  Ax 2 and  scale  it  by  the  factor  1 / max(j4x2)  to  obtain  the  third  approximation  X3  to  a 
dominant  eigenvector.  Compute  the  Rayleigh  quotient  of  X3  to  obtain  the  third  approximation  to  the 
dominant  eigenvalue. 

Continuing  in  this  way  will  generate  a sequence  of  better  and  better  approximations  to  the  dominant 
eigenvalue  and  a corresponding  eigenvector. 


John  William  Strutt  Rayleigh  (1842-1919) 


The  British  mathematical  physicist  John  Rayleigh  won  the  Nobel  prize  in  physics  in  1 904  for 
his  discovery  of  the  inert  gas  argon.  Rayleigh  also  made  fundamental  discoveries  in  acoustics  and  optics,  and 
his  work  in  wave  phenomena  enabled  him  to  give  the  first  accurate  explanation  of  why  the  sky  is  blue. 

[Image:  The  Granger  Collection,  New  York ;] 


EXAMPLE  3 Example  2 Revisited  Using  Maximum  Entry  Scaling 


Apply  the  power  method  with  maximum  entry  scaling  to 

r 

0_ 

Stop  at  X5  and  compare  the  resulting  approximations  to  the  exact  values  and  to  the  approximations 
obtained  in  Example  2. 


A = 


3 2 
2 3 


with  xg  = 


We  leave  it  for  you  to  confirm  that 


T 

"3' 

v.  _ -^xo  _ 1 

"3' 

"1.00000 

0 

2 

1 max(-4x0)  3 

2 

0.66667 

^4xi  « 

'3  2 

[ 1.00000' 

'4.33333' 

_ -4xi 

1 

'4.33333' 

'1.00000' 

_2  3 

[o.66667 

4.00000_ 

max(Ax\) 

~ 4.33333 

_4.00000_ 

_0.92308_ 

-4x2  « 

"3  2 

1 [ 1.00000' 

"4.84615' 

iri  - -4x2 

1 

'4.84615' 

'1.00000' 

_2  3 

[o.92308_ 

_4.76923_ 

max  (.4x2) 

~ 4.84615 

_4.76923_ 

0.98413 

Ax  3 

'3  2 

1 [ 1.00000' 

'4.96825' 

v,_  -4x3 

1 

'4.96825' 

'1.00000' 

LO 

J [o.98413_ 

_4.95238_ 

max  (-4x3) 

~ 4.96825 

_4.95238_ 

0.99681 

Ax  4 

'3  2 

1 [ 1.00000' 

'4.99361' 

X--  ^4 

- 1 

'4.99361' 

*1.00000* 

_2  3 

[o.99681 

4.99042_ 

max  (.4x4) 

~ 4.99361 

_4.99042_ 

_0.99936_ 

\ (1)  _ -4xi  ~xi  (y4xi)rxi  ^ 7.00000 

xi -xi  x7xi  ~ 1.44444 

\ (2)  _ -x?  (Ax.2)  rx2  _ 9.24852 

x2-x2  xrX2  ~ 1.85207 

\ (3)  Ati  -x3  Q4x3)  rx 3 _ 9.84203 

*3- *3  xfx3  ~ 196851 

x (4)  Ac4  • X4  _ (-4x4)  rx4  ^ 9.96808 
*4- *4  xjx4  ~ 1 99362 

x(5)  _ ^xs  -xs  _ (^Xj)  rxj  9.99360 

A x5-x5  T ~ 1.99872 


^4.84615 
^4.99361 
» 4. 99974 
^4.99999 
« 5.00000 


Thus,  \(?)  approximates  the  dominant  eigenvalue  correctly  to  five  decimal  places  and  closely 
approximates  the  dominant  eigenvector 

r 


that  results  by  taking  { — \ in  6. 


Whereas  the  power  method  with  Euclidean  scaling 
produces  a sequence  that  approaches  a unit 
dominant  eigenvector,  maximum  entry  scaling 
produces  a sequence  that  approaches  an  eigenvector 
whose  largest  component  is  1 . 


Rate  of  Convergence 

If  A is  a symmetric  matrix  whose  distinct  eigenvalues  can  be  arranged  so  that 

|Ai|>|A2|>|A3|>...>|Afc| 

then  the  “rate”  at  which  the  Rayleigh  quotients  converge  to  the  dominant  eigenvalue  X\  depends  on  the  ratio  |Ai  | / | A2 1 ; 
that  is,  the  convergence  is  slow  when  this  ratio  is  near  1 and  rapid  when  it  is  large — the  greater  the  ratio,  the  more  rapid 
the  convergence.  For  example,  if  A is  a 2 x 2 symmetric  matrix,  then  the  greater  the  ratio  |Ai  | / |A2|,  the  greater  the 


disparity  between  the  scaling  effects  of  Ai  and  A2  in  Figure  9.2.1,  and  hence  the  greater  the  effect  that  multiplication  by 
A has  on  pulling  the  iterates  toward  the  eigenspace  of  . Indeed,  the  rapid  convergence  in  Example  3 is  due  to  the  fact 
that  |Ai  | / |A2|  = 5/1=5,  which  is  considered  to  be  a large  ratio.  In  cases  where  the  ratio  is  close  to  1,  the 
convergence  of  the  power  method  may  be  so  slow  that  other  methods  must  be  used. 


Stopping  Procedures 

If  A is  the  exact  value  of  the  dominant  eigenvalue,  and  if  a power  method  produces  the  approximation  \(&  at  the  Ml 
iteration,  then  we  call 


(12) 


the  relative  error  in  A(*0.  If  this  is  expressed  as  a percentage,  then  it  is  called  the  percentage  error  in  A(*)  For 
example,  if  A = 5 and  the  approximation  after  three  iterations  is  A®  = 5. 1 , then 


A-A^ 

5-5.1 

A 

5 

relative  error  in  A®  = 


percentage  error  in  A®  = 0.02  x 100%  = 2% 


= | — 0.02|  = 0.02 


In  applications  one  usually  knows  the  relative  error  E that  can  be  tolerated  in  the  dominant  eigenvalue,  so  the  goal  is  to 
stop  computing  iterates  once  the  relative  error  in  the  approximation  to  that  eigenvalue  is  less  than  E.  However,  there  is  a 
problem  in  computing  the  relative  error  from  12  in  that  the  eigenvalue  A is  unknown.  To  circumvent  this  problem,  it  is 
usual  to  estimate  A by  A®  and  stop  the  computations  when 


A(*)_A(*-1) 


A(*) 


<E 


(13) 


The  quantity  on  the  left  side  of  13  is  called  the  estimated  relative  error  in  A(*0  and  its  percentage  form  is  called  the 
estimated  percentage  error  in  \(kX 

EXAMPLE  4 Estimated  Relative  Error 

For  the  computations  in  Example  3,  find  the  smallest  value  of  k for  which  the  estimated  percentage  error 
in  is  less  than  0.1%. 


The  estimated  percentage  errors  in  the  approximations  in  Example  3 are  as  follows: 


APPROXIMATION 


A©: 

A^: 

A»: 

A®: 


A® -A® 

4.99361  — 4 84615 

A© 

4.99361 

A^-A^ 

4.99974-4.99361 

A^ 

4.99974 

A® -A® 

4.99999-4.99974 

A® 

4.99999 

A®  — A® 

5.00000-4.99999 

A® 

5.00000 

RELATIVE  PERCENTAGE 
ERROR  ERROR 

« 0.02953  = 2.953% 

« 0.00123  = 0.123% 
w 0.00005  = 0.005% 


0.00000  = 0% 


Thus,  \($  = 4.99999  is  the  first  approximation  whose  estimated  percentage  error  is  less  than  0.1%. 


A rule  for  deciding  when  to  stop  an  iterative  process  is  called  a stopping  procedure.  In  the  exercises,  we  will 
discuss  stopping  procedures  for  the  power  method  that  are  based  on  the  dominant  eigenvector  rather  than  the  dominant 
eigenvalue. 


Concept  Review 

Power  sequence 

Dominant  eigenvalue 

Dominant  eigenvector 

Power  method  with  Euclidean  scaling 

Rayleigh  quotient 

Power  method  with  maximum  entry  scaling 

Relative  error 

Percentage  error 

Estimated  relative  error 

Estimated  percentage  error 

Stopping  procedure 

Skills 

Identify  the  dominant  eigenvalue  of  a matrix. 

Use  the  power  methods  described  in  this  section  to  approximate  a dominant  eigenvector. 
Find  the  estimated  relative  and  percentage  errors  associated  with  the  power  methods. 


Exercise  Set  9.2 

In  Exercises  1-2,  the  distinct  eigenvalues  of  a matrix  are  given.  Determine  whether  A has  a dominant  eigenvalue,  and 
if  so,  find  it. 

L(a)  Ai=7,  A2  = 3,  A3=  -8,  A4=l 
(b)  Ai  = - 5.  A2  = 3,  A3  = 2,  A4  = 5 

Answer: 

(a)  A3  dominant 

(b)  No  dominant  eigenvalue 

2. 


(a)  A i = 1 , A2  = 0,  A3  = — 3,  A4  = 2 

(b)  Aj  = — 3,  A2  = - 2,  A3=  - 1,  A4  = 3 


In  Exercises  3-4,  apply  the  power  method  with  Euclidean  scaling  to  the  matrix  A,  starting  with  xq  and  stopping  at  X4. 
Compare  the  resulting  approximations  to  the  exact  values  of  the  dominant  eigenvalue  and  the  corresponding  unit 
eigenvector. 

1" 

0_ 

Answer: 


0.98058' 

0.98837' 

0.98679' 

0.98715' 

XI  £2 

-0.19612 

; 

-0.15206 

; 

-0.16201 

; x4« 

-0.15977 

dominant  eigenvalue:  A = 2 + /To  « 5. 16228; 


dominant  eigenvector: 


3 — /To’ 


4. 

1 

1 

O' 

T 

A = 

-2  6 

-2 

; x0  = 

0 

0 -2 

5 

0 

1 

—0.16228 


In  Exercises  5-6,  apply  the  power  method  with  maximum  entry  scaling  to  the  matrix  A,  starting  with  X0  and  stopping 
at  X4.  Compare  the  resulting  approximations  to  the  exact  values  of  the  dominant  eigenvalue  and  the  corresponding 
scaled  eigenvector. 


5. 


A = 


*0  = 


1 

1 


xi  = 
X4  sa 


'-1 
_ 1 

'-0.53488 

1 


-0.5 

1 


A®  = 6;  x2  = 

, ss  6.60555; 


A®  = 6.6;  X3 : 


-0.53846 

1 


, A®  « 6.60550, 


dominant  eigenvalue:  A = 3 + j/TJ ~ 6.60555: 


dominant  eigenvector: 


3 

^26  + 4^  13 

2 -t-  1/T3 
1/26+4/T3 


6. 

'3  2 2~ 

A = 

2 2 0 

; x0  = 

2 0 4 

-0.47186 

0.88167 


A = 


x0  = 


1 

0 


7.  Let 


(a)  Use  the  power  method  with  maximum  entry  scaling  to  approximate  a dominant  eigenvector  of  A.  Start  with  xq, 
round  off  all  computations  to  three  decimal  places,  and  stop  after  three  iterations. 

(b)  Use  the  result  in  part  (a)  and  the  Rayleigh  quotient  to  approximate  the  dominant  eigenvalue  of  A. 

(c)  Find  the  exact  values  of  the  eigenvector  and  eigenvalue  approximated  in  parts  (a)  and  (b). 

(d)  Find  the  percentage  error  in  the  approximation  of  the  dominant  eigenvalue. 


Answer: 


1 

1 

i 

XI  = 

-0.5 

, x2  = 

-0.8 

, x3« 

-0.929 

(b)  A(1>  = 2.8,  A®  r*  2.976,  A®  m 2.997 
v } Dominant  eigenvalue:  \ = 3;  dominant  eigenvector: 
(d)  0.1% 

8.  Repeat  the  directions  of  Exercise  7 with 


-1 


"2  1 O' 

T 

A = 

1 2 0 

; x0  = 

1 

O 

o 

o 

1 

In  Exercises  9-10,  a matrix  A with  a dominant  eigenvalue  and  a sequence  xq,  y are  given.  Use  Formulas 

9 and  10  to  approximate  the  dominant  eigenvalue  and  a corresponding  eigenvector. 


'1  2' 

.2  1_ 

; xo  = 

T 

_o_ 

, j4xq  = 

"f 

_2_ 

, A2xq  = 

"5' 

_4_ 

'13' 

. .4 

'41' 

*5 

121' 

_!4_ 

, A XQ  = 

_40_ 

, .*4  xq  = 

_ 1 22  _ 

Answer: 
2.99993; 
A = 


0.99180 

1.00000 


'1  2' 

_2  1_ 

; x0  = 

'o' 

_1_ 

, 4lxo  = 

'2' 

_1_ 

, -42xq  = 

'4' 

_5_ 

10. 


A xo  = 

11.  Consider  matrices 


'14' 

. .4 

"40' 

/i5 

122' 

_13_ 

, A XQ  = 

_41_ 

, ^4xo  = 

121 

A = 


-1  O' 

0 0 


and  xq  = 


a 

b 


where  xq  is  a unit  vector  and  a & 0-  Show  that  even  though  the  matrix  A is  symmetric  and  has  a dominant 
eigenvalue,  the  power  sequence  1 in  Theorem  9.2.1  does  not  converge.  This  shows  that  the  requirement  in  that 
theorem  that  the  dominant  eigenvalue  be  positive  is  essential. 


12.  Use  the  power  method  with  Euclidean  scaling  to  approximate  the  dominant  eigenvalue  and  a corresponding 

eigenvector  of  A.  Choose  your  own  starting  vector,  and  stop  when  the  estimated  percentage  error  in  the  eigenvalue 
approximation  is  less  than  0.1%. 


(a)  r 1 3 3 

3 4-1 

3 -1  10 

(b)  1 0 11" 

0 2—11 

1 -1  4 1 

1118 

13.  Repeat  Exercise  12,  but  this  time  stop  when  all  corresponding  entries  in  two  successive  eigenvector  approximations 
differ  by  less  than  0.01  in  absolute  value. 

Answer: 


Starting  with  0 , it  takes  8 iterations. 

0 


Starting  with  , it  takes  8 iterations. 


14.  Repeat  Exercise  12  using  maximum  entry  scaling. 

15.  Prove:  If  A is  a nonzero  nxn  matrix,  then  and  AA T have  positive  dominant  eigenvalues. 

16.  (For  readers  familiar  with  proof  by  induction)  Let  A be  an  n x n matrix,  let  xq  be  a unit  vector  in  R ”,  and  define 
the  sequence  x\ , X2, . . x^, ...  by 

Axn  Ax  i Axjr-\ 

X1  - ll^oll  ’ X2_  IMxill Xk~  Mxfc-lH’'" 

Prove  by  induction  that  = AkXQf  ||.<4*xo||- 

17.  (For  readers  familiar  with  proof  by  induction)  Let  A be  an  n x n matrix,  let  xq  be  a nonzero  vector  in  R ”,  and 

define  the  sequence  xi , X2, . . Xfc, . . . by 

Ax  n Ax]  Ax  ir i 

xl  = Ta^T*  x2  = /It.  • • •>  *k  = , ? 1 s , • ■ ■ 

max(^xo)  max(j5xi)  max(^x^-l) 

Prove  by  induction  that 
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9.3  Internet  Search  Engines 

Early  search  engines  on  the  Internet  worked  by  examining  key  words  and  phrases  in  pages  and  titles  of  posted  documents.  Today's  most  popular  search  engines  use  algorithms 
based  on  the  power  method  to  analyze  hyperlinks  (references)  between  documents.  In  this  section  we  will  discuss  one  of  the  ways  in  which  this  is  done. 

Google,  the  most  widely  used  engine  for  searching  the  Internet,  was  developed  in  1996  by  Larry  Page  and  Sergey  Brin  while  both  were  graduate  students  at  Stanford  University. 
Google  uses  a procedure  known  as  the  PageRank  algorithm  to  analyze  how  documents  at  relevant  sites  reference  one  another.  It  then  assigns  to  each  site  a PageRank  score, 
stores  those  scores  as  a matrix,  and  uses  the  components  of  the  dominant  eigenvector  of  that  matrix  to  establish  the  relative  importance  of  the  sites  to  the  search. 

Google  starts  by  using  a standard  text-based  search  engine  to  find  an  initial  set  Sg  of  sites  containing  relevant  pages.  Since  words  can  have  multiple  meanings,  the  set  Sg  will 
typically  contain  irrelevant  sites  and  miss  others  of  relevance.  To  compensate  for  this,  the  set  Sg  is  expanded  to  a larger  set  S by  adjoining  all  sites  referenced  by  the  pages  in  the 
sites  of  Sq.  The  underlying  assumption  is  that  S will  contain  the  most  important  sites  relevant  to  the  search.  This  process  is  then  repeated  a number  of  times  to  refine  the  search 
information  still  further. 

To  be  more  specific,  suppose  that  the  search  set  S contains  n sites,  and  define  the  adjacency  matrix  for  S to  be  the  matrix  A = [a^  ] in  which 

ajj  = 1 if  site  i references  site  j 

ay  = 0 if  site  i does  not  reference  site  j 

We  will  assume  that  no  site  references  itself,  so  the  diagonal  entries  of  A will  all  be  zero. 


EXAMPLE  1 Adjacency  Matrices 


Here  is  a typical  adjacency  matrix  for  a search  set  with  four  sites: 

Referenced  Site 

12  3 4 


A = 


0 0 
1 0 
1 0 
1 1 


1 1 
0 0 
0 1 
1 0 


Thus,  Site  1 references  Sites  3 and  4,  Site  2 references  Site  1 , and  so  forth. 


Referencing  Site 


(1) 


There  are  two  basic  roles  that  a site  can  play  in  the  search  process — the  site  may  be  a hub,  meaning  that  it  references  many  other  sites,  or  it  may  be  an  authority,  meaning  that  it 
is  referenced  by  many  other  sites.  A given  site  will  typically  have  both  hub  and  authority  properties  in  that  it  will  both  reference  and  be  referenced. 

The  term  google  is  a variation  of  the  word  googol,  which  stands  for  the  number  10*^  (1  followed  by  100  zeros).  This  term  was  invented  by  the 
American  mathematician  Edward  Kasner  (1878-1955)  in  1938,  and  the  story  goes  that  it  came  about  when  Kasner  asked  his  eight-year-old  nephew  to  give  a name  to  a 
really  big  number — he  responded  with  “googol.”  Kasner  then  went  on  to  define  a googolplex  to  be  I0googo1  (1  followed  by  googol  zeros). 


In  general,  if  ^4  is  an  adjacency  matrix  for  n sites,  then  the  column  sums  of  A measure  the  authority  aspect  of  the  sites  and  the  row  sums  of  A measure  their  hub  aspect.  For 
example,  the  column  sums  of  the  matrix  in  1 are  3,  1,2,  and  2,  which  means  that  Site  1 is  referenced  by  three  other  sites,  Site  2 is  referenced  by  one  other  site,  and  so  forth. 
Similarly,  the  row  sums  of  the  matrix  in  1 are  2,  1,2,  and  3,  so  Site  1 references  two  other  sites,  Site  2 references  one  other  site,  and  so  forth. 

Accordingly,  if  A is  an  adjacency  matrix,  then  we  call  the  vector  hg  of  row  sums  of  A the  initial  hub  vector  of  A,  and  we  call  the  vector  ag  of  column  sums  of  A the  initial 
authority  vector  of  A.  Alternatively,  we  can  think  of  ag  as  the  vector  of  row  sums  of  , which  turns  out  to  be  more  convenient  for  computations.  The  entries  in  the  hub  vector 
are  called  hub  weights  and  those  in  the  authority  vector  authority  weights. 


EXAMPLE  2 Initial  Hub  and  Authority  Vectors  of  an  Adjacency  Matrix 

Find  the  initial  hub  and  authority  vectors  for  the  adjacency  matrix^  in  Example  1. 

The  row  sums  of  A yield  the  initial  hub  vector 


hg  = 


Site  1 
Site  2 
Site  3 
Site  4 


and  the  row  sums  of  A ~ (the  column  sums  of  A)  yield  the  initial  authority  vector 

3 
1 
2 
2 


ag  = 


Site  1 
Site  2 
Site  3 
Site  4 


(2) 


(3) 


The  link  counting  in  Example  2 suggests  that  Site  4 is  the  major  hub  and  Site  1 is  the  greatest  authority.  However,  counting  links  does  not  tell  the  whole  story;  for  example,  it 
seems  reasonable  that  if  Site  1 is  to  be  considered  the  greatest  authority,  then  more  weight  should  be  given  to  hubs  that  link  to  that  site,  and  if  Site  4 is  to  be  considered  a major 


hub,  then  more  weight  should  be  given  to  sites  to  which  it  links.  Thus,  there  is  an  interaction  between  hubs  and  authorities  that  needs  to  be  accounted  for  in  the  search  process. 
Accordingly,  once  the  search  engine  has  calculated  the  initial  authority  vector  ag,  it  then  uses  the  information  in  that  vector  to  create  new  hub  and  authority  vectors  hi  and  ai 
using  the  formulas 


hi  = 


Aan 

UsqW 


and 


■4r hi 

Uphill 


(4) 


The  numerators  in  these  formulas  do  the  weighting,  and  the  normalization  serves  to  control  the  size  of  the  entries.  To  understand  how  the  numerators  accomplish  the  weighting, 
view  the  product  Aag  as  a linear  combination  of  the  column  vectors  of  A with  coefficients  from  ag.  For  example,  with  the  adjacency  matrix  in  Example  1 and  the  authority  vector 
calculated  in  Example  2 we  have 

Referenced  Site 


12  3 4 


"o 

0 

1 

1 

3 

0 

0 

1 

1 

4 

Site  1 

1 

0 

0 

0 

1 

_ 7 

1 

+ 1 

0 

+ 2 

0 

+ 2 

0 

3 

Site  2 

1 

0 

0 

1 

2 

— 3 

1 

0 

0 

1 

5 

Site  3 

1 

1 

1 

0 

2 

1 

1 

1 

0 

6 

Site  4 

Thus,  we  see  that  the  links  to  each  referenced  site  are  weighted  by  the  authority  values  in  ag  To  control  the  size  of  the  entries,  the  search  engine  normalizes  .dag  to  produce  the 
updated  hub  vector 


hi  = 


^4an  _ 1 

Magll  ^§6 


4 

'0.43133' 

3 

0.32350 

5 

0.53916 

6 

0.64700 

Site  1 
Site  2 
Site  3 
Site  4 


New  Hub  Weights 


The  new  hub  vector  hi  can  now  be  used  to  update  the  authority  vector  using  Formula  4.  The  product  A performs  the  weighting,  and  the  normalization  controls  the  size: 

Referencing  Site 

12  3 4 


ATh\  « 


"o 

1 

1 

f 

’0.43133' 

'o' 

Y 

Y 

Y 

'1.50966' 

0 

0 

0 

1 

0.32350 

m 0.43133 

0 

+ 0.32350 

0 

+ 0.53916 

0 

+ 0.64700 

1 

0.64700 

1 

0 

0 

1 

0.53916 

1 

0 

0 

1 

1.07833 

1 

0 

1 

0 

0.64700 

1 

0 

1 

0 

0.97049 

Site  1 
Site  2 
Site  3 
Site  4 


a 41 r hi 1 

1 \\A%\\  ~ 2.19142 


1.50966 

0.64700 

1.07833 

0.97049 


0.68889 

0.29524 

0.49207 

0.44286 


Site  1 
Site  2 
Site  3 
Site  4 


New  Authority  Weights 


Once  the  updated  hub  and  authority  vectors,  hi  and  ai,  are  obtained,  the  search  engine  repeats  the  process  and  computes  a succession  of  hub  and  authority  vectors,  thereby 
generating  the  interrelated  sequences 


1,1  idioF 


h2  ImF 


, Aaj 

3_  Pa2|| 


s 


1 


s 


/■ 


ATh, 

ao.  a!  = 

1141%  || 


a2 


41% 

M%ll 


a3  : 


Ar h? 

U%\\ 


n,  — Aa.k-1 

* W*k-\W  ' 


Arhk 

aA  = 

\\AThk\\ 


(5) 


(6) 


However,  each  of  these  is  a power  sequence  in  disguise.  For  example,  if  we  substitute  the  expression  for  h^  into  the  expression  for  a^,  then  we  obtain 


a A: r 


at( 

^a*-l  ^ 

i aT  a\ 

AThk 

- \ 

Ma*_i||  j 

IM%II 

Mr( 

^afc—i 

||  “ II  j-47jllja*-1|| 

1141a*-!  II  J1 

which  means  that  we  can  rewrite  6 as 


Similarly,  we  can  rewrite  5 as 


ag,  ai  = 


1 

[at£ 

|ag 

II  (41  % 

|ag|| 

a2  = 


1 

iai 

i 

iafc—i 

III 

(^%)ail|  ’ 

’ * n(/%) 

iafc-ill 

hi  = 

1 Maoll 


4141+1 

h2  = -)— =( . 

II  4141%  || 


h*  = 


1 

[aat) 

|hk-l 

ill 

K 

lh*-il|  : 

(7) 


(8) 


In  Exercise  15  of  Section  9.2  you  were  asked  to  show  that  /J A and  AA^  both  have  positive  dominant  eigenvalues.  That  being  the  case,  Theorem  9.2.1  ensures  that  7 
and  8 converge  to  the  dominant  eigenvectors  of  A Y and  AA  ^ respectively.  The  entries  in  those  eigenvectors  are  the  authority  and  hub  weights  that  Google  uses  to  rank  the 
search  sites  in  order  of  importance  as  hubs  and  authorities. 


EXAMPLE  3 A Ranking  Procedure 


Suppose  that  a search  engine  produces  10  Internet  sites  in  its  search  set  and  that  the  adjacency  matrix  for  those  sites  is 


Referenced  Site 
123456789  10 


A = 


1 1 
0 0 


0 0 
0 0 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 


Referencing  Site 


Use  Formula  7 to  rank  the  sites  in  decreasing  order  of  authority. 


Solu  We  will  take  ag  to  be  the  normalized  vector  of  column  sums  of  A,  and  then  we  will  compute  the  iterates  in  7 until  the  authority  vectors  seem  to 
stabilize.  We  leave  it  for  you  to  show  that 


and  that 


(AtA)  a0f 


a0  = 


1 

{54 


0 

0 

2 

0.27217 

1 

0.13608 

1 

0.13608 

5 

0.68041 

3 

0.40825 

1 

0.13608 

3 

0.40825 

0 

0 

2 

0.27217 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

2 

1 

1 

2 

0 

0 

2 

0 

1 

0.27217 

3.26599 

0 

1 

1 

1 

1 

0 

0 

1 

0 

1 

0.13608 

1.90516 

0 

1 

1 

1 

1 

0 

0 

1 

0 

1 

0.13608 

1.90516 

0 

2 

1 

1 

5 

0 

0 

2 

0 

1 

0.68041 

5.30723 

0 

0 

0 

0 

0 

3 

1 

0 

0 

0 

0.40825 

1.36083 

0 

0 

0 

0 

0 

1 

1 

0 

0 

0 

0.13608 

0.54433 

0 

2 

1 

1 

2 

0 

0 

3 

0 

1 

0.40825 

3.67423 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

1 

1 

1 

0 

0 

1 

0 

2 

0.27217 

2.17732 

Thus, 


al  = 


1 

iao 

1 

II  [ata] 

laoll 

“ 8.15362 

0 

0 

3.26599 

0.40056 

1.90516 

0.23366 

1.90516 

0.23366 

5.30723 

0.65090 

1.36083 

0.16690 

0.54433 

0.06676 

3.67423 

0.45063 

0 

0 

2.17732 

0.26704 

Continuing  in  this  way  yields  the  following  authority  iterates: 


a0 


1 

lag  | 

iai  | 

Uta] 

la2  1 

[ata' 

|a3 

11  (^) 

laoll  ll(^)aill  ll| 

[ata] 

M 34  ii| 

[ata] 

|a3ll 

a9  = 


1 

y«) 

la8  1 

|a9 

"I 

[SA) 

aio  — 

lasll  ll| 

Hi 

0 

0.27217 

0.13608 

0.13608 

0.68041 

0.40825 

0.13608 

0.40825 

0 

0.27217 

0 

0.40056 
0.23366 
0.23366 
0.65090 
0.16690 
0.06676 
0.45063 
0 

0.26704 


0 

0.41652 

0.24917 

0.24917 

0.63407 

0.06322 

0.02603 

0.46672 

0 

0.27892 


0 

0.41918 

0.25233 

0.25233 

0.62836 

0.02372 

0.00981 

0.47050 

0 

0.28300 


0 

0.41973 

0.25309 

0.25309 

0.62665 

0.00889 

0.00368 

0.47137 

0 

0.28416 


0 

0.41990 

0.25337 

0.25337 

0.62597 

0.00007 

0.00003 

0.47165 

0 

0.28460 


0 

0.41990 

0.25337 

0.25337 

0.62597 

0.00002 

0.00001 

0.47165 

0 

0.28460 


Site  1 
Site  2 
Site  3 
Site  4 
Site  5 
Site  6 
Site  7 
Site  8 
Site  9 
Site  10 


The  small  changes  between  a9  and  ajg  suggest  that  the  iterates  have  stabilized  near  a dominant  eigenvector  of  ^4.  From  the  entries  in  ajg  we  conclude  that  Sites 
1,  6,  7,  and  9 are  probably  irrelevant  to  the  search  and  that  the  remaining  sites  should  be  searched  in  order  of  decreasing  importance  as 


Site  5,  Site  8,  Site  2,  Site  10,  Site  3 and  4 (a  tie) 


Concept  Review 

Adjacency  matrix 
Hub  vector 
Authority  vector 
Hub  weights 
Authority  weights 
Skills 

Find  the  initial  hub  and  authority  vectors  of  an  adjacency  matrix. 
Use  the  method  of  Example  3 to  rank  sites. 


Exercise  Set  9.3 


In  Exercises  1-2,  find  the  initial  hub  and  authority  vectors  for  the  given  adjacency  matrix  A. 

1. 


Referenced  Site 
1 2 3 


A = 


0 0 1 
1 0 1 
1 0 1 


Referencing  Site 


Answer: 


T 

"2" 

ho  = 

2 

> a0  = 

0 

2 

3 

Referenced  Site 
12  3 4 


0 10  1 
10  0 1 
10  0 1 
1110 


1 

2 Referencing  Site 

3 

4 


In  Exercises  3-4,  find  the  updated  hub  and  authority  vectors  hi  and  ai  for  the  adjacency  matrix  A. 
3.  The  matrix  in  Exercise  1. 

Answer: 


'0.39057" 

"0.60971' 

0.65094 

0.65094 

, ai« 

0 

0.79262 

4.  The  matrix  in  Exercise  2. 

In  Exercises  5-8,  the  adjacency  matrix  A of  an  Internet  search  engine  is  given.  Use  the  method  of  Example  3 to  rank  the  sites  in  decreasing  order  of  authority. 

5.  Referenced  Site 

12  3 4 


0 0 10 
10  0 0 
110  0 
0 10  0 


1 

2 Referencing  Site 

3 

4 


Answer: 


Sites  1 and  2 (tie);  sites  3 and  4 are  irrelevant 

6.  Referenced  Site 

12  3 4 


A = 


0 110 
0 0 10 
10  0 1 
10  0 0 


1 

2 Referencing  Site 

3 

4 


7.  Referenced  Site 

1 2 3 4 5 


A = 


0 1110  1 

l ° ° ° J I Ref erencing  Site 

0 0 0 0 1 3 

0 1 0 0 0 4 

0 110  0 5 


Answer: 


Site  2,  site  3,  site  4;  sites  1 and  5 are  irrelevant 

8.  Referenced  Site 


A = 


123456789  10 

0 1 1 0 1 1 0 0 0 1 1 

0010000000  2 
0000000001  3 

0 1 1 0 0 1 1 0 0 1 4 

0001000000  5 

0100000000  6 
0000000010  7 

0000010000  8 
0 110  0 10  10  1 9 

0000010000  10 


Referencing  Site 
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9.4  Comparison  of  Procedures  for  Solving  Linear 
Systems 

There  is  an  old  saying  that  “time  is  money.”  This  is  especially  true  in  industry  where  the  cost  of  solving  a linear 
system  is  generally  determined  by  the  time  it  takes  for  a computer  to  perform  the  required  computations.  This 
typically  depends  both  on  the  speed  of  the  computer  processor  and  on  the  number  of  operations  required  by  the 
algorithm.  Thus,  choosing  the  right  algorithm  has  important  financial  implication  in  an  industrial  or  research  setting. 
In  this  section  we  will  discuss  some  of  the  factors  that  affect  the  choice  of  algorithms  for  solving  large-scale  linear 
systems. 


Flops  and  the  Cost  of  Solving  a Linear  System 

In  computer  jargon,  an  arithmetic  operation  4= ) on  two  real  numbers  is  called  a flop , which  is  an  acronym 

for  “floating-point  operation.”  The  total  number  of  flops  required  to  solve  a problem,  which  is  called  the  cost  of  the 
solution,  provides  a convenient  way  of  choosing  between  various  algorithms  for  solving  the  problem.  When  needed, 
the  cost  in  flops  can  be  converted  to  units  of  time  or  money  if  the  speed  of  the  computer  processor  and  the  financial 
aspects  of  its  operation  are  known.  For  example,  many  of  today’s  personal  computers  are  capable  of  performing  in 
excess  of  10  gigaflops  per  second  (1  gigaflop  =10^  flops).  Thus,  an  algorithm  that  costs  1,000,000  flops  would  be 
executed  in  0.0001  seconds. 

To  illustrate  how  costs  (in  flops)  can  be  computed,  let  us  count  the  number  of  flops  required  to  solve  a linear  system 
of  n equations  in  n unknowns  by  Gauss-Jordan  elimination.  For  this  purpose  we  will  need  the  following  formulas  for 
the  sum  of  the  first  n positive  integers  and  the  sum  of  the  squares  of  the  first  n positive  integers: 

1+2  + 3 + ...  + * = ^2±JL  (1) 

12  + 22  + 32  + ...+»2  = m(”+  1)(2«  + 1)  (2) 

0 


Let  Jix  = b be  a linear  system  of  n equations  in  n unknowns  to  be  solved  by  Gauss-Jordan  elimination  (or, 
equivalently,  by  Gaussian  elimination  with  back  substitution).  For  simplicity,  let  us  assume  that  A is  invertible  and 
that  no  row  interchanges  are  required  to  reduce  the  augmented  matrix  [^4|b  ] to  row  echelon  form.  The  diagrams  that 
accompany  the  following  analysis  provide  a convenient  way  of  counting  the  operations  required  to  introduce  a 
leading  1 in  the  first  row  and  then  zeros  below  it.  In  our  operation  counts,  we  will  lump  divisions  and  multiplications 
together  as  “multiplications,”  and  we  will  lump  additions  and  subtractions  together  as  “additions.” 

It  requires  n flops  (multiplications)  to  introduce  the  leading  1 in  the  first  row. 


1 x x • ■ ■ X X 

X 

• • • • • • • • 

• 

x denotes  aquantity  that  is  being  computed. 

• • • • • 

• 

• denotes  a quantity  that  is  not  being  computed. 

• • • • • • • • 

• 

The  augmented  matrix  size  is  n x {n  + 1)  . 

• • • • • 

• 

It  requires  n multiplications  and  n additions  to  introduce  a zero  below  the  leading  1 , and  there  are  n — 1 rows 
below  the  leading  1,  so  the  number  of  flops  required  to  introduce  zeros  below  the  leading  1 is  2n{n  — 1). 


1 • • 

0 x x 

0 x x 


x 

X 


X 

X 


X 

X 


0 X X 

0 x x 


X 

X 


X 

X 


X 

X 


Combining  Steps  1 and  2,  the  number  of  flops  required  for  column  1 is 

n + 2n  (n  — 1 J = 2 «2  — n 

The  procedure  for  column  2 is  the  same  as  for  column  1 , except  that  now  we  are 
dealing  with  one  less  row  and  one  less  column.  Thus,  the  number  of  flops 
required  to  introduce  the  leading  1 in  row  2 and  the  zeros  below  it  can  be  obtained 
by  replacing  n by  ^ _ ] in  the  flop  count  for  the  first  column.  Thus,  the  number  of 
flops  required  for  column  2 is 

2(n  — l)2  — — 1 J 

By  the  argument  for  column  2,  the  number  of  flops  required  for  column  3 is 

2{n  — 2)2  — — 2 J 

The  pattern  should  now  be  clear.  The  total  number  of  flops  required  to  create  the  n 
leading  Ts  and  the  associated  zeros  is 

^2-«)  + [2(«-l)2—  l)]  + [2(*-2)2—  — 2)]+...+  (2-l) 

which  we  can  rewrite  as 

2^«2  + (n  — 1 ) 2 + . . . + 1 J — lJ  + ...+  1 J 

or  on  applying  Formulas  1 and  2 as 

^ *(*  + 1)(2*  + 1)  _ 1)  _ 2 3 , 1 2_  1 

6 2 “3  +2  6 


Next,  let  us  count  the  number  of  operations  required  to  complete  the  backward 
phase  (the  back  substitution). 

It  requires  n—\  multiplications  and  ^ — 1 additions  to  introduce  zeros  above  the 
leading  1 in  the  nth  column,  so  the  total  number  of  flops  required  for  the  column 
is  2{n  — 1). 


1 

• 

• 

• ■ • 

0 

0 

1 

• 

• ■ • 

0 

A 

X 

0 

0 

1 ■ 

• • • 

0 

X 

0 

0 

0 ■ 

• • 1 

0 

X 

0 

0 

0 

0 ■ 

• • 0 

1 

The  procedure  is  the  same  as  for  Step  1 , except  that  now  we  are  dealing  with  one 
less  row.  Thus,  the  number  of  flops  required  for  the  (n  — 1 ) st  column  is  2{n  — 2) 


1 • 

0 1 

0 0 


0 

0 

0 


0 

0 

0 


x 

x 

X 


0 0 0 
0 0 0 


1 0 
0 1 


By  the  argument  for  column  {n  — 1 ) , the  number  of  flops  required  for  column 
{n  — 2)  is  2{n  — 3). 

The  pattern  should  now  be  clear.  The  total  number  of  flops  to  complete  the 
backward  phase  is 

2^3  — lJ  + 2^3  — 2j  + 2^  — 3j  + ...  + 2^  — «J  = 2^2  — ^1  + 2 + ...  + « JJ 


which  we  can  rewrite  using  Formula  1 as 


n>(  2 «(»  + 1)  ^ 2 

2[  n — —\  = n —n 


In  summary,  we  have  shown  that  for  Gauss-Jordan  elimination  the  number  of  flops  required  for  the  forward  and 
backward  phases  is 


2 3 1 2 1 

flops  for  forward  phase  = —n  + — n — —n 

5 Z o 


flops  for  backward  phase  — n-n 


Thus,  the  total  cost  of  solving  a linear  system  by  Gauss-Jordan  elimination  is 


2 3 3 2 7 

flops  for  both  phases  = —n  + — n — —n 
5 Z o 


(3) 

(4) 

(5) 


Cost  Estimates  for  Solving  Large  Linear  Systems 


It  is  a property  of  polynomials  that  for  large  values  of  the  independent  variable  the  term  of  highest  power  makes  the 
major  contribution  to  the  value  of  the  polynomial.  Thus,  for  large  linear  systems  we  can  use  3 and  4 to  approximate 
the  number  of  flops  in  the  forward  and  backward  phases  as 


flops  for  forward  phase 


(6) 


2 

flops  for  backward  phase  « n 


(7) 


This  shows  that  it  is  more  costly  to  execute  the  forward  phase  than  the  backward  phase  for  large  linear  systems. 


Indeed,  the  cost  difference  between  the  forward  and  backward  phases  can  be  enormous,  as  the  next  example  shows. 


EXAMPLE  1 Cost  of  Solving  a Large  Linear  System 

Approximate  the  time  required  to  execute  the  forward  and  backward  phases  of  Gauss-Jordan 
elimination  for  a system  of  10,000  ( = JO4)  equations  in  10,000  unknowns  using  a computer  that  can 
execute  10  gigaflops  per  second. 

We  have  n — ]04  for  the  given  system,  so  from  6 and  7 the  number  of  gigaflops  required 
for  the  forward  and  backward  phases  is 

gigaflops  for  forward  phase  cs  xl0-S'  = y^l04J  xl0-9  = jxl03 

gigaflops  for  backward  phase  x 10-9  = ^104J  x 10-9  = 10 
Thus,  at  10  gigaflops/s  the  execution  times  for  the  forward  and  backward  phases  are 

10-1  s Ps  66.67  s 

10"1  swO.Ol  s 


time  for  forward  phase  ^ ^ x 103  j x 
time  for  backward  phase  « [1 0 ~ M x 


We  leave  it  as  an  exercise  for  you  to  confirm  the  results  in  Table  1 . 


Table  1 


Approximate  Cost  for  an  ^ x n Matrix  A with  Large  n 
Algorithm  Cost  in  Flops 

Gauss-Jordan  elimination  (forward  phase) 

«-|«3 

Gauss-Jordan  elimination  (backward  phase) 

**2 

Zt/-decomposition  of  A 

ss 

Forward  substitution  to  solve  Ly  = b 

S3*2 

Backward  substitution  to  solve  Z7x  = y 

S3*2 

^-1  by  reducing  [A\l]  to  1 

ss  2«3 

Compute  A~^b 

2 «3 

Considerations  in  Choosing  an  Algorithm  for  Solving  a Linear  System 

For  a single  linear  system  Ax  = b of  n equations  in  n unknowns,  the  methods  of  Zt/-decomposition  and  Gauss- 
Jordan  elimination  differ  in  bookkeeping  but  otherwise  involve  the  same  number  of  flops.  Thus,  neither  method  has 
a cost  advantage  over  the  other.  However,  Zt/-decomposition  has  other  advantages  that  make  it  the  method  of 
choice: 


Gauss-Jordan  elimination  and  Gaussian  elimination  both  use  the  augmented  matrix  [^4|b  ] , so  b must  be  known. 
In  contrast,  ZU-decomposition  uses  only  the  matrix  A,  so  once  that  decomposition  is  known  it  can  be  used  with  as 
many  right-hand  sides  as  are  required,  one  at  a time. 

The  ZU-decomposition  that  is  computed  to  solve  Ax  = b can  be  used  to  compute  A if  needed,  with  little 

additional  work. 

For  large  linear  systems  in  which  computer  memory  is  at  a premium,  one  can  dispense  with  the  storage  of  the  l's 
and  zeros  that  appear  on  or  below  the  main  diagonal  of  £/,  since  those  entries  are  known  from  the  form  of  U.  The 
space  that  this  opens  up  can  then  be  used  to  store  the  entries  of  Z,  thereby  reducing  the  amount  of  memory 
required  to  solve  the  system. 

If  A is  a large  matrix  consisting  mostly  of  zeros,  and  if  the  nonzero  entries  are  concentrated  in  a “band”  around  the 
main  diagonal,  then  there  are  techniques  that  can  be  used  to  reduce  the  cost  of  ZU-decomposition,  giving  it  an 
advantage  over  Gauss-Jordan  elimination. 

The  cost  in  flops  for  Gaussian  elimination  is  the 
same  as  that  for  the  forward  phase  of  Gauss- 
Jordan  elimination. 


Concept  Review 

Flop 

Formula  for  the  sum  of  the  first  n positive  integers 
Formula  for  the  sum  of  the  squares  of  the  first  n positive  integers 
Cost  in  flops  for  solving  large  linear  systems  by  various  methods 
Cost  in  flops  for  inverting  a matrix  by  row  reduction 

Issues  to  consider  when  choosing  an  algorithm  to  solve  a large  linear  system 

Skills 

Compute  the  cost  of  solving  a linear  system  by  Gauss-Jordan  elimination. 

Approximate  the  time  required  to  execute  the  forward  and  backward  phases  of  Gauss-Jordan  elimination. 
Approximate  the  time  required  to  find  an  Z U-decomposition  of  a matrix. 

Approximate  the  time  required  to  find  the  inverse  of  an  invertible  matrix. 


Exercise  Set  9.4 

1.  A certain  computer  can  execute  10  gigaflops  per  second.  Use  Formula  5 to  find  the  time  required  to  solve  the 
system  using  Gauss-Jordan  elimination. 

(a)  A system  of  1000  equations  in  1000  unknowns. 

(b)  A system  of  10,000  equations  in  10,000  unknowns. 

(c)  A system  of  100,000  equations  in  100,000  unknowns. 


Answer: 


(a)  ps  0.067  second 

(b)  ps  66.68  seconds 

(c)  fsi  66,  668  seconds,  or  about  18.5  hours 

2.  A certain  computer  can  execute  100  gigaflops  per  second.  Use  Formula  5 to  find  the  time  required  to  solve  the 
system  using  Gauss-Jordan  elimination. 

(a)  A system  of  10,000  equations  in  10,000  unknowns. 

(b)  A system  of  100,000  equations  in  100,000  unknowns. 

(c)  A system  of  1,000,000  equations  in  1,000,000  unknowns. 

3.  Today's  personal  computers  can  execute  70  gigaflops  per  second.  Use  Table  1 to  estimate  the  time  required  to 
perform  the  following  operations  on  the  invertible  10,000  x 10,000  matrix  A. 

(a)  Execute  the  forward  phase  of  Gauss-Jordan  elimination. 

(b)  Execute  the  backward  phase  of  Gauss-Jordan  elimination. 

(c)  LU"decomposition  of  A. 

(d)  Find  A-1  by  reducing  [A\l]  to  l A 1 J. 

Answer: 

(a)  « 9.52  seconds 

(b)  « 0.0014  second 

(c)  « 9.52  seconds 

(d)  « 28.6  seconds 

4.  The  IBM  Roadrunner  computer  can  operate  at  speeds  in  excess  of  1 petaflop  per  second  (1  petaflop  =10^ 

flops).  Use  Table  1 to  estimate  the  time  required  to  perform  the  following  operations  of  the  invertible 
100,  000  x 100,  000  matrix  A. 

(a)  Execute  the  forward  phase  of  Gauss-Jordan  elimination. 

(b)  Execute  the  backward  phase  of  Gauss-Jordan  elimination. 

(c)  £ ^/-decomposition  of  A. 

(A)  Find  A by  reducing  [A\I]  to  1 A J . 

(a)  Approximate  the  time  required  to  execute  the  forward  phase  of  Gauss-Jordan  elimination  for  a system  of 
100,000  equations  in  100,000  unknowns  using  a computer  that  can  execute  1 gigaflop  per  second.  Do  the 
same  for  the  backward  phase.  (See  Table  1 .) 

(b)  How  many  gigaflops  per  second  must  a computer  be  able  to  execute  to  find  the  £ ^/-decomposition  of  a 
matrix  of  size  10,000  x 10,000  in  less  than  0.5  s?  (See  Table  1.) 

Answer: 

(a)  6.67  x 10 5 s for  forward  phase,  10  s for  backward  phase 

(b)  1334 


6.  About  how  many  teraflops  per  second  must  a computer  be  able  to  execute  to  find  the  inverse  of  a matrix  of  size 
100,  000  x 100,  000  in  less  than  0.5  s?  (1  teraflop  = 10lz  flops.) 

In  Exercises  7-10,  A and  B are  n x n matrices  and  c is  a real  number. 

7.  How  many  flops  are  required  to  compute  cA 
Answer: 

n 2 flops 

8.  How  many  flops  are  required  to  compute  A | 5? 

9.  How  many  flops  are  required  to  compute  AB ? 

Answer: 

2 r?  — r?  flops 

10.  If  A is  a diagonal  matrix  and  k is  a positive  integer,  how  many  flops  are  required  to  compute  Ak^ 
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9.5  Singular  Value  Decomposition 

In  this  section  we  will  discuss  an  extension  of  the  diagonalization  theory  for  n x n symmetric  matrices  to  general 
mxn  matrices.  The  results  that  we  will  develop  in  this  section  have  applications  to  compression,  storage,  and 
transmission  of  digitized  information  and  form  the  basis  for  many  of  the  best  computational  algorithms  that  are 
currently  available  for  solving  linear  systems. 


Decompositions  of  Square  Matrices 

We  saw  in  Formula  2 of  Section  7.2  that  every  symmetric  matrix  A can  be  expressed  as 

A = PDPr  (1) 

where  P is  an  n x n orthogonal  matrix  of  eigenvectors  of  A,  and  D is  the  diagonal  matrix  whose  diagonal  entries  are 
the  eigenvalues  corresponding  to  the  column  vectors  of  P.  In  this  section  we  will  call  1 an  eigenvalue 
decomposition  of  A (abbreviated  EVD  of  A). 

If  an  ft  x n matrix  A is  not  symmetric,  then  it  does  not  have  an  eigenvalue  decomposition,  but  it  does  have  a 
Hessenberg  decomposition 

A = PHPr 

in  which  P is  an  orthogonal  matrix  and  H is  in  upper  Hessenberg  form  (Theorem  7.2.4). 

Moreover,  if  A has  real  eigenvalues,  then  it  has  a Schur  decomposition 

A = PSPr 

in  which  P is  an  orthogonal  matrix  and  S is  upper  triangular  (Theorem  7.2.3). 

The  eigenvalue,  Hessenberg,  and  Schur  decompositions  are  important  in  numerical  algorithms  not  only  because  the 
matrices  D , //,  and  S have  simpler  forms  than  A,  but  also  because  the  orthogonal  matrices  that  appear  in  these 
factorizations  do  not  magnify  roundoff  error.  To  see  why  this  is  so,  suppose  that  x is  a column  vector  whose  entries 
are  known  exactly  and  that 

A 

x = x + e 

is  the  vector  that  results  when  roundoff  error  is  present  in  the  entries  of  x 

If  P is  an  orthogonal  matrix,  then  the  length-preserving  property  of  orthogonal  transformations  implies  that 

||JPx-/4'||  = ||x-x||  = ||e|| 

which  tells  us  that  the  error  in  approximating  pQ  by  Px  has  the  same  magnitude  as  the  error  in  approximating  x by 

x 

There  are  two  main  paths  that  one  might  follow  in  looking  for  other  kinds  of  decompositions  of  a general  square 
matrix  A:  One  might  look  for  decompositions  of  the  form 

A = PJP~l 

in  which  P is  invertible  but  not  necessarily  orthogonal,  or  one  might  look  for  decompositions  of  the  form 

a=uwt 


in  which  U and  V are  orthogonal  but  not  necessarily  the  same.  The  first  path  leads  to  decompositions  in  which  J is 
either  diagonal  or  a certain  kind  of  block  diagonal  matrix,  called  a Jordan  canonical  form  in  honor  of  the  French 
mathematician  Camille  Jordan  (see  p.  510).  Jordan  canonical  forms,  which  we  will  not  consider  in  this  text,  are 
important  theoretically  and  in  certain  applications,  but  they  are  of  lesser  importance  numerically  because  of  the 
roundoff  difficulties  that  result  from  the  lack  of  orthogonality  in  P.  In  this  section  we  will  focus  on  the  second  path. 


Singular  Values 

Since  matrix  products  of  the  form  A ^ A will  play  an  important  role  in  our  work,  we  will  begin  with  two  basic 
theorems  about  them. 


THEOREM  9.5.1 

If  A is  an  m x n matrix,  then: 

(a)  A and  A ^ A have  the  same  null  space. 

(b)  A and  A TA  have  the  same  row  space. 

(c)  A ^ and  A ^ A have  the  same  column  space. 

(d)  A and  A u4  have  the  same  rank. 


We  will  prove  part  ( a ) and  leave  the  remaining  proofs  for  the  exercises. 

We  must  show  that  every  solution  of  Ax  = 0 is  a solution  of  A ^ Ax  = 0?  and  conversely.  If  xq  is  any 
solution  of  Ax.  — 0?  then  xq  is  also  a solution  of  A T Ax  = 0 since 

AtAx0=AT{Ax0)j  = ATQ  = 0 

Conversely,  if  xq  is  any  solution  of  A ^ Ax  = 0?  then  xq  is  in  the  null  space  of  A ^ A and  hence  is  orthogonal  to  all 
vectors  in  the  row  space  of  A U4  hy  part  (< q ) of  Theorem  4.8. 10. 

However,  ^4^4  is  symmetric,  so  xq  is  also  orthogonal  to  every  vector  in  the  column  space  of  AT A In  particular,  xq 
must  be  orthogonal  to  the  vector  [ A1  A [xq;  that  is, 

xo-  (d^Jxg  = 0 

Using  the  first  formula  in  Table  1 of  Section  3.2  and  properties  of  the  transpose  operation  we  can  rewrite  this  as 

Xq  ^^jxo  = (-dxo)  r(4xo)  = (4*0 J • (AkoJ  = 11^X0 II 2 = 0 

which  implies  that  Axq  = 0,  thereby  proving  that  xq  is  a solution  of  Axq  = 0. 


THEOREM  9.5.2 


If  A is  an  m x n matrix,  then: 

(a)  ATA  is  orthogonally  diagonalizable. 

(b)  The  eigenvalues  of  A are  nonnegative. 


The  matrix  A ^A>  being  symmetric,  is  orthogonally  diagonalizable  by  Theorem  7.2.1. 

Proof  (b)  Since  A TA  is  orthogonally  diagonalizable,  there  is  an  orthonormal  basis  for  R}}  consisting  of 
eigenvectors  of  A^A>  saY  (vl>  v2»  --->  vM)  . If  we  let  A2, AM  be  the  corresponding  eigenvalues,  then  for 
1 < i < n we  have 


||Av2||2  = Avj  - Av i = vj  - ATAvi  [Formula  (26)  of  Section  3.21 
= Vi  • AiVi  = Ai  (Vi  - Vi)  = A| II  Vi  II 2 = Ai 
It  follows  from  this  relationship  that  A2  > 0. 


1 


DEFINITION  1 


If  A is  an  m x n matrix,  and  if  Aj , A2, , AM  are  the  eigenvalues  of  A then  the  numbers 

£71  = <72  = /A2,  — , On  = 


are  called  the  singular  values  of  A. 


We  will  assume  throughout  this  section  that  the 
eigenvalues  of  A ^ A are  named  so  that 

Ai  > A2  >...>  AM  > 0 

and  hence  that 

0*1  >&2  ^ 0 

EXAMPLE  1 Singular  Values 

Find  the  singular  values  of  the  matrix 

"1  1 
0 1 
1 0 


The  first  step  is  to  find  the  eigenvalues  of  the  matrix 


'1  0 f 

'1 

r 

'2  r 

0 

1 

= 

_1  1 0. 

1 

1 2 

0 

ata= 


The  characteristic  polynomial  of  A T A is 

A2_4A  + 3=  (A-3)(A-l) 

so  the  eigenvalues  of  A TA  are  Aj  = 3 and  A2  = 1 and  the  singular  values  of  A in  order  of  decreasing 
size  are 

(T\  = f\\  = {?>,  a 2 = {\2  = 1 


Singular  Value  Decomposition 


Before  turning  to  the  main  result  in  this  section,  we  will  find  it  useful  to  extend  the  notion  of  a “main  diagonal”  to 
matrices  that  are  not  square.  We  define  the  main  diagonal  of  an  m x n matrix  to  be  the  line  of  entries  shown  in 
Figure  9.5.1 — it  starts  at  the  upper  left  comer  and  extends  diagonally  as  far  as  it  can  go.  We  will  refer  to  the  entries 
on  the  main  diagonal  as  the  diagonal  entries. 


X X X X X X X 
X X X X X X X 
X X X X X X X 
X X X X X X X 


X X X X 
X X X X 
X X X X 
X X X X 
X X X X 
X X X X 
X X X X 


Main  diagonal 


Figure  9.5.1 

We  are  now  ready  to  consider  the  main  result  in  this  section,  which  is  concerned  with  a specific  way  of  factoring  a 
general  m'Kn  matrix  A.  This  factorization,  called  singular  value  decomposition  (abbreviated  SVD)  will  be  given  in 
two  forms,  a brief  form  that  captures  the  main  idea,  and  an  expanded  form  that  spells  out  the  details.  The  proof  is 
given  at  the  end  of  this  section. 


Singular  Value  Decomposition 

If  A is  an  m x n matrix,  then  A can  be  expressed  in  the  form 

A=lTLVr 

where  U and  V are  orthogonal  matrices  and  £ is  an  m x n matrix  whose  diagonal  entries  are  the  singular 
values  of  A and  whose  other  entries  are  zero. 


Harry  Bateman  (1882-1946) 


The  term  singular  value  is  apparently  due  to  the  British-born  mathematician  Harry 
Bateman,  who  used  it  in  a research  paper  published  in  1908.  Bateman  emigrated  to  the  United  States  in 
1910,  teaching  at  Bryn  Mawr  College,  Johns  Hopkins  University,  and  finally  at  the  California  Institute  of 
Technology.  Interestingly,  he  was  awarded  his  Ph.D.  in  1913  by  Johns  Hopkins  at  which  point  in  time  he 
was  already  an  eminent  mathematician  with  60  publications  to  his  name. 

[Image'.  Courtesy  of  the  Archives,  California  Institute  of  Technology] 


Singular  Value  Decomposition  (Expanded  Form) 

If  A is  an  m x n matrix  of  rank  k,  then  A can  be  factored  as 


A=ULVi  = [ ui  u2 


1 

0 

0 

0 ... 

■ bb" 

OJ 

b ... 
0 ... 

0ftx(w-fc) 

0 0 ■ • " <7fc 

fc)x(«—  k) 

0(m-fc)xfc 

T 

V1 

T 

v2 


T 

Vk 

T 

vfc+l 


in  which  £/,  £,  and  V have  sizes  m x m,  mxn,  and«x«,  respectively,  and  in  which 

(a)  V = [v l V2  ...  v„  ] orthogonally  diagonalizes  A T A- 

(b)  The  nonzero  diagonal  entries  of  £ are  a\  = ^aJ",  a 2 = ^2, = ^A^,  where  Aj,  A2, Afc  are  the 
nonzero  eigenvalues  of  A T A corresponding  to  the  column  vectors  of  V. 

(c)  The  column  vectors  of  V are  ordered  so  that  a\  >0‘2^.--^.°‘k>  ® 

Wui=iiffr =^Avi  h1'2 *) 

(e)  {uj,  U2, u^}  is  an  orthonormal  basis  for  col(A)}. 

(f)  (m,  U2, \im)  is  an  extension  of  (uj,  U2, u^}  to  an  ortho-normal  basis  for  Rm. 


The  vectors  ui,  U2, are  called  the  left 
singular  vectors  of  A,  and  the  vectors 
vi , V2, Vfc  are  called  the  right  singular  vectors 
of  A. 


EXAMPLE  2 Singular  Value  Decomposition  if  A Is  Not  Square 

Find  a singular  value  decomposition  of  the  matrix 

A = 


1 1 
0 1 
1 0 


We  showed  in  Example  1 that  the  eigenvalues  of  A ^ A are  Aj  = 3 and  A2  = 1 and  that  the 
corresponding  singular  values  of  A are  a\  = y^3  and  a 2 = 1 . We  leave  it  for  you  to  verify  that 


vi  = 


[£1 

£] 

2 

a 

and  V2  = 

2 

Ji 

2 

2 

are  eigenvectors  corresponding  to  Aj  and  A2,  respectively,  and  that  V = [vi  |V2]  orthogonally 
diagonalizes  A ^ A-  From  part  (d)  of  Theorem  9.5.4,  the  vectors 


ui  = 


a\  3 


'1  f 
0 1 

\£] 

2 

1 0 

£ 

2 

u2  = 


^v2=(l) 


'l  f 

ill 

0 1 
1 0 

2 

Ji 

2 

£ 

3 

£ 

6 

£ 

6 


0 

Ji 

2 

£ 

2 


are  two  of  the  three  column  vectors  of  U.  Note  that  and  112  are  orthonormal,  as  expected.  We  could 
extend  the  set  (uj,  112}  to  an  orthonormal  basis  for  pf.  However,  the  computations  will  be  easier  if 
we  first  remove  the  messy  radicals  by  multiplying  and  112  by  appropriate  scalars.  Thus,  we  will  look 
for  a unit  vector  U3  that  is  orthogonal  to 


/6ui  = 


and  /2u2  = 


0 

-1 

1 


To  satisfy  these  two  orthogonality  conditions,  the  vector  113  must  be  a solution  of  the  homogeneous 


linear  system 


"*f 

'2 

1 f 



'O' 

*2 

0 

-1  1 

0 

_*3_ 

We  leave  it  for  you  to  show  that  a general  solution  of  this  system  is 


'*l' 

'-r 

*2 

= t 

i 

*3 

i 

Normalizing  the  vector  on  the  right  yields 


u3  = 


& 

i 

& 

i 

& 


Thus,  the  singular  value  decomposition  of  A is 


1 1 
0 1 
1 0 


£ 

o 

3 

i? 

6 

2 

£ 

Jl  J_ 

6 

2 ji 

/3  0 

0 1 
0 0 


A 


U £ VT 


You  may  want  to  confirm  the  validity  of  this  equation  by  multiplying  out  the  matrices  on  the  right  side. 


Eugenio  Beltrami  (1835-1900) 


Camille  Jordan  (1838-1922) 


Herman  Klaus  Weyl  (1885-1955) 


Gene  H.  Golub  (1932-) 


The  theory  of  singular  value  decompositions  can  be  traced  back  to  the  work  of  five 
people:  the  Italian  mathematician  Eugenio  Beltrami,  the  French  mathematician  Camille  Jordan,  the  English 
mathematician  James  Sylvester  (see  p.  34),  and  the  German  mathematicians  Erhard  Schmidt  (see  p.  360) 
and  the  mathematician  Herman  Weyl.  More  recently,  the  pioneering  efforts  of  the  American  mathematician 
Gene  Golub  produced  a stable  and  efficient  algorithm  for  computing  it.  Beltrami  and  Jordan  were  the 
progenitors  of  the  decomposition — Beltrami  gave  a proof  of  the  result  for  real,  invertible  matrices  with 
distinct  singular  values  in  1873.  Subsequently,  Jordan  refined  the  theory  and  eliminated  the  unnecessary 
restrictions  imposed  by  Beltrami.  Sylvester,  apparently  unfamiliar  with  the  work  of  Beltrami  and  Jordan, 
rediscovered  the  result  in  1889  and  suggested  its  importance.  Schmidt  was  the  first  person  to  show  that  the 
singular  value  decomposition  could  be  used  to  approximate  a matrix  by  another  matrix  with  lower  rank, 
and,  in  so  doing,  he  transformed  it  from  a mathematical  curiosity  to  an  important  practical  tool.  Weyl 
showed  how  to  find  the  lower  rank  approximations  in  the  presence  of  error. 

[Images:  wikipedia  ( Beltrami );  The  Granger  Collection , New  York  (Jordan)]  Courtesy  Electronic  Publishing 
Services , Inc.,  New  York  City  (Weyl]  wikipedia  (Golub)] 


OPTIONAL 


We  conclude  this  section  with  an  optional  proof  of  Theorem  9.5.4. 


For  notational  simplicity  we  will  prove  this  theorem  in  the  case  where  A is  an  n x n 
matrix.  To  modify  the  argument  for  an^x«  matrix  you  need  only  make  the  notational  adjustments  required  to 
account  for  the  possibility  that  m > n or  n > m. 


The  matrix  A^A  is  symmetric,  so  it  has  an  eigenvalue  decomposition 

ata  = vdvt 

in  which  the  column  vectors  of 


V = [vi|v2|...|v„] 

are  unit  eigenvectors  of  an<3  T)  is  a diagonal  matrix  whose  successive  diagonal  entries  Aj,  A2, A„  are  the 
eigenvalues  of  A ^ A corresponding  in  succession  to  the  column  vectors  of  y . Since  A is  assumed  to  have  rank  k,  it 
follows  from  Theorem  9.5.1  that  A ^ A also  has  rank  k.  It  follows  as  well  that  D has  rank  k,  since  it  is  similar  to  A J A 
and  rank  is  a similarity  invariant.  Thus,  D can  be  expressed  in  the  form 


Ai 


^2 


0 


D = 


(2) 


0 0 

where  Aj  > A2  > . . . > Afc  > 0.  Now  let  us  consider  the  set  of  image  vectors 


{Av\,Av2,---,Avn) 


(3) 


This  is  an  orthogonal  set,  for  if  i * j,  then  the  orthogonality  of  v2  and  v;  implies  that 

Avj  • Avj  = Vj  - A TAvj  = Vj  * XjVj  = \j  - vj  J = 0 

Moreover,  the  first  k vectors  in  3 are  nonzero  since  we  showed  in  the  proof  of  Theorem  9.5.2 b that  \\Av,  ||2  = Ay  f°r 
i = 1,  2, n,  and  we  have  assumed  that  the  first  k diagonal  entries  in  2 are  positive.  Thus, 

S=  {Av\,  Av2,  Avk) 


is  an  orthogonal  set  of  nonzero  vectors  in  the  column  space  of  A.  But  the  column  space  of  A has  dimension  k since 

rank  (A  J = rank  (A J Aj  = k 


and  hence  5,  being  a linearly  independent  set  of  k vectors,  must  be  an  orthogonal  basis  for  col(A).  If  we  now 
normalize  the  vectors  in  S , we  will  obtain  an  orthonormal  basis  {uj,  u2, u^}  for  col(A)  in  which 


u,  = 


Av 


i — . 


M»<l  ft 


Avj 


1 <i<k\ 


or,  equivalently,  in  which 


Av\  = /x^ui  = aiui , Av2  = /A2U2  = a2u2, . . = fauk  = erkuk 


(4) 


It  follows  from  Theorem  6.3.6  that  we  can  extend  this  to  an  orthonormal  basis 

(ui,u2 Ufc,  Uft+1,...,  UM} 

for  Rn.  Now  let  U be  the  orthogonal  matrix 

U = [U1  U2  — Ufc+1  — u«] 

and  let  £ be  the  diagonal  matrix 


£ = 


*1 


It  follows  from  4,  and  the  fact  that  Avl  = 0 for j > £,  that 

JJL  = [a \u\  a2U2  ...  ajtUh  0 

= [-4vi  Av 2 ...  Avfr  Avjt+i 

= AV 

which  we  can  rewrite  using  the  orthogonality  of  V as  a _ JJYJV  ^ • 


0] 

Avn\ 


Concept  Review 

Eigenvalue  decomposition 
Hessenberg  decomposition 
Schur  decomposition 
Magnification  of  roundoff  error 
Properties  that  A and  A ^ A have  in  common 
A T A is  orthogonally  diagonalizable 
Eigenvalues  of  A T A are  nonnegative 
Singular  values 

Diagonal  entries  of  a matrix  that  is  not  square 
Singular  value  decomposition 

Skills 

Find  the  singular  values  of  an  mxn  matrix. 

Find  a singular  value  decomposition  of  an  m x n matrix. 


Exercise  Set  9.5 


In  Exercises  1-4,  find  the  distinct  singular  values  of  A . 

1.-4=  [ 1 2 0] 


Answer: 


0,  {5 


Answer: 


In  Exercises  5-12,  find  a singular  value  decomposition  of  A. 


Answer: 


1 l_ 

{2  {2 

^11 

{2  {2 


ft  0 [1  o' 

0 /2  L°  1 _ 


Answer: 

2 li  r j_  _2_~ 

{I  r8  0] 

j_  _2_  L°  2J 2_  j_ 

1/5J  ft 


Answer: 


A = 


2 J_ 

1 ft 

h 0 

2 J_ 

'3  f2 


ft. 

6 

2^2 

3 

.ft 


3/2  0 
0 0 


{2  {2 
1 1 

/2  /2 


10. 


i4  = 


11. 


-2  -1 

2 1 

1 0 
1 1 

-1  1 


2 

-2 


Answer: 


A = 


-A=  0 -4= 


\ 

ft 

1 J_ 

ft  ft 

1 J_ 

ft  ft 


2 

ft 

ft 

ft 


{l  0 

0 {2 


1 0 
0 1 


12. 


A = 


6 4 
0 0 
4 0 


13.  Prove:  If  A is  an  m x « matrix,  then  A ^4  and  AA  ^ have  the  same  rank. 

14.  Prove  part  ( d)  of  Theorem  9.5. 1 by  using  part  {a)  of  the  theorem  and  the  fact  that  A and  A TA  have  n columns. 

(a)  Prove  part  ( b ) of  Theorem  9.5.1  by  first  showing  that  row^^J  is  a subspace  of  row(A). 

(b)  Prove  part  (c)  of  Theorem  9.5.1  by  using  part  ( b ). 

16.  Let  T:Rn  — > Rm  be  a linear  transformation  whose  standard  matrix  A has  the  singular  value  decomposition 
A = UYV  and  let  B = { vi , V2, . . } and  Br  = <j  u\ , 112, . . j>  be  the  column  vectors  of  V and  Z7, 
respectively.  Show  that  51  = [T] 

17.  Show  that  the  singular  values  of  A^A  are  the  squares  of  the  singular  values  of  A . 

18.  Show  that  if  a = UYV  T is  a singular  value  decomposition  of  A , then  U orthogonally  diagonalizes  AA 

True-False  Exercises 


In  parts  (a)-(g)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer, 
(a)  If  A is  an  m x n matrix,  then  A T A is  an  m x m matrix 


Answer: 


False 

(b)  If  A is  an  m x n matrix,  then  A^A  is  a symmetric  matrix. 

Answer: 

True 

(c)  If  A is  an  m x n matrix,  then  the  eigenvalues  of  A ^ A are  positive  real  numbers. 
Answer: 

False 

(d)  If  A is  an  n x n matrix,  then  A is  orthogonally  diagonalizable. 

Answer: 

False 

(e)  If  A is  an  m x n matrix,  then  A T A is  orthogonally  diagonalizable. 

Answer: 

True 

(i)  The  eigenvalues  of  A ^ A are  the  singular  values  of  A. 

Answer: 

False 

(g)  Every  m x n matrix  has  a singular  value  decomposition. 

Answer: 

True 
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9.6  Data  Compression  Using  Singular  Value  Decomposition 

Efficient  transmission  and  storage  of  large  quantities  of  digital  data  has  become  a major  problem  in  our  technological  world.  In  this  section 
we  will  discuss  the  role  that  singular  value  decomposition  plays  in  compressing  digital  data  so  that  it  can  be  transmitted  more  rapidly  and 
stored  in  less  space.  We  assume  here  that  you  have  read  Section  9.5  . 


Reduced  Singular  Value  Decomposition 


Algebraically,  the  zero  rows  and  columns  of  the  matrix  £ in  Theorem  9.5.4  are  superfluous  and  can  be  eliminated  by  multiplying  out  the 
expression  JJYV  ^ using  block  multiplication  and  the  partitioning  shown  in  that  formula.  The  products  that  involve  zero  blocks  as  factors 
drop  out,  leaving 


A = [ui 


r" 

v\ 

0 • ■ 

■ 0 

V1 

0 

02  ■ • 

■ 0 

T 

v2 

0 

0 ■ ■ 

1 

0) 


which  is  called  a reduced  singular  value  decomposition  of  A.  In  this  text  we  will  denote  the  matrices  on  the  right  side  of  1 by  U\,  Ej,  and 
r respectively,  and  we  will  write  this  equation  as 


A=UiLlr( 


(2) 


Note  that  the  sizes  of  U\ , , and  lr-  are  m x it,  k x k*  and  k xn*  respectively,  and  that  the  matrix  Lj  is  invertible,  since  its  diagonal 

entries  are  positive. 


If  we  multiply  out  on  the  right  side  of  1 using  the  column-row  rule,  then  we  obtain 

j4  = ffiuivf  + <72U2V2’+...  + 0-JcU*v£  (3) 

which  is  called  a reduced  singular  value  expansion  of  A.  This  result  applies  to  all  matrices,  whereas  the  spectral  decomposition  [Formula 
7 of  Section  7.2]  applies  only  to  symmetric  matrices. 


It  can  be  proved  that  an  m x n matrix  M has  rank  1 if  and  only  if  it  can  be  factored  as  M — uv^,  where  u is  a column  vector  in 
Rm  and  V is  a column  vector  in  R n.  Thus,  a reduced  singular  value  decomposition  expresses  a matrix  A of  rank  k as  a linear  combination 
of  k rank  1 matrices. 


EXAMPLE  1 Reduced  Singular  Value  Decomposition 


Find  a reduced  singular  value  decomposition  and  a reduced  singular  value  expansion  of  the  matrix 


A = 


1 

0 

1 


1 

1 

0 


In  Example  2 of  Section  9.5  we  found  the  singular  value  decomposition 


1 1 
0 1 
1 0 


A = 


R 

3 


1 

R 

R _R  j_ 

6 2 ft 


\[s_  ^2 

6 2 

U 


1 


l/3  0 


j/2  _^2 

2 2 

j/2  yfc 

2 2 


V‘ 


(4) 


Since  A has  rank  2 (verify),  it  follows  from  1 with  k = 2 that  the  reduced  singular  value  decomposition  of  A corresponding 
to  4 is 


1 1 
0 1 
1 0 


& o 

{e  \j~2 

6 2 

]/~6  2 

6 2 


/3  0 

0 1 


R 

2 


R 

2 


{2  {2 

2 2 


This  yields  the  reduced  singular  value  expansion 


1 1 
0 1 
1 0 


= 0-iuivf  + 0-2112V2  = {3 


’0 

3 

R 

\j 2 2 

+ 0) 

2 

6 

R 

6 

_2  2 

R 

2 

= f3 


3 3 

£ £ 

6 6 

H £ 

6 6 


+ 0) 


0 0 

_1  1 

2 2 

1 _I 

2 2 


£2  _ {2 
2 2 


Note  that  the  matrices  in  the  expansion  have  rank  1,  as  expected. 


Data  Compression  and  Image  Processing 

Singular  value  decompositions  can  be  used  to  “compress”  visual  information  for  the  purpose  of  reducing  its  required  storage  space  and 
speeding  up  its  electronic  transmission.  The  first  step  in  compressing  a visual  image  is  to  represent  it  as  a numerical  matrix  from  which  the 
visual  image  can  be  recovered  when  needed. 

For  example,  a black  and  white  photograph  might  be  scanned  as  a rectangular  array  of  pixels  (points)  and  then  stored  as  a matrix  ^4  by 
assigning  each  pixel  a numerical  value  in  accordance  with  its  gray  level.  If  256  different  gray  levels  are  used  (0  = white  to  255  = black), 
then  the  entries  in  the  matrix  would  be  integers  between  0 and  255.  The  image  can  be  recovered  from  the  matrix  ^4  by  printing  or 
displaying  the  pixels  with  their  assigned  gray  levels. 


Original  Reconstruction 

In  1924  the  U.S.  Federal  Bureau  of  Investigation  (FBI)  began  collecting  fingerprints  and  handprints  and  now 
has  more  than  30  million  such  prints  in  its  files.  To  reduce  the  storage  cost,  the  FBI  began  working  with  the  Los  Alamos  National 
Laboratory,  the  National  Bureau  of  Standards,  and  other  groups  in  1993  to  devise  rank  based  compression  methods  for  storing 
prints  in  digital  form.  The  following  figure  shows  an  original  fingerprint  and  a reconstruction  from  digital  data  that  was 
compressed  at  a ratio  of  26: 1 . 


If  the  matrix  A has  size  ^ x then  one  might  store  each  of  its  y^n  entries  individually.  An  alternative  procedure  is  to  compute  the  reduced 
singular  value  decomposition 


J4  = «riuivf  + CT2U2V2  + • • • +<7kUfcVjT  (5) 

in  which  cr\  > aj  > ...  > er^,  and  store  the  t/s,  the  u's,  and  the  y's. 

When  needed,  the  matrix  A (and  hence  the  image  it  represents)  can  be  reconstructed  from  5.  Since  each  has  m entries  and  each  v;  has  n 
entries,  this  method  requires  storage  space  for 

km+kn  A~k  = k{m  A-n  4=  1) 

numbers.  Suppose,  however,  that  the  singular  values  o>_ are  sufficiently  small  that  dropping  the  corresponding  terms  in  5 
produces  an  acceptable  approximation 


Ar  = criuiVj'  +CT2U2V2  + ‘ ’ • 4-  <TrurvJ  (6) 

to  A and  the  image  that  it  represents.  We  call  6 the  rank  r approximation  of  A.  This  matrix  requires  storage  space  for  only 

rm+rn  =F  r = r(m  + n + 1) 

numbers,  compared  to  mn  numbers  required  for  entry-by-entry  storage  of  A.  For  example,  the  rank  100  approximation  of  a 1000  x 1000 
matrix  A requires  storage  for  only 

100(1000  + 1000  4-1)  = 200,  100 

numbers,  compared  to  the  1,000,000  numbers  required  for  entry-by-entry  storage  of  A — a compression  of  almost  80%. 

Figure  9.6.1  shows  some  approximations  of  a digitized  mandrill  image  obtained  using  6. 


Rank  4 Rank  10  Rank  20  Rank  50  Rank  128 


Figure  9.6.1 


Concept  Review 

Reduced  singular  value  decomposition 
Reduced  singular  value  expansion 
Rank  of  an  approximation 

Skills 

Find  the  reduced  singular  value  decomposition  of  an  m x n matrix. 
Find  the  reduced  singular  value  expansion  of  an  m x n- 


Exercise  Set  9.6 


In  Exercises  1-4,  find  a reduced  singular  value  decomposition  of  A.  [Note:  Each  matrix  appears  in  Exercise  Set  9.5,  where  you 
asked  to  find  its  (unreduced)  singular  value  decomposition.] 


1. 


A = 


-2  2 

-1  1 

2 “2 


Answer: 


[3/i] 


J L 

/2  {2 


II 

c4 

“2 

2 

-1 

1 

3. 

1 

0" 

A = 

1 

1 

1 

Answer: 

2 

1 -2 


-7=  0 

ft 

J L 

ft 

1 J_ 

'ft  /2 


A = 


0 

"1  0" 

0 {2 

_0  1_ 

In  Exercises  5-8,  find  a reduced  singular  value  expansion  of  A. 
5.  The  matrix  A in  Exercise  1 . 


Answer: 


3/2 


2 

3 

1 

3 

2 
3 


/2  /2 


1 1 


6.  The  matrix  v4  in  Exercise  2. 


7.  The  matrix  ^4  in  Exercise  3. 


Answer: 


1 


f3 


0 

1 


[1  0]  + /2  [0  1] 


1 


1 


f2 


8.  The  matrix  A in  Exercise  4. 

9.  Suppose  ^4  is  a 200  x 500  matrix.  How  many  numbers  must  be  stored  in  the  rank  100  approximation  of  A?  Compare  this  with  the 
number  of  entries  of  A. 

Answer: 

70,100  numbers  must  be  stored;  A has  100,000  entries 

True-False  Exercises 

In  parts  (a) — (c)  determine  whether  the  statement  is  true  or  false,  and  justify  your  answer.  Assume  that  UjLifr'i  is  a reduced  singular 
value  decomposition  of  an  m x n matrix  of  rank  k. 

(a)  U\  has  sizeOTX<fc- 
Answer: 

True 

(b) £i  has  size  kxk- 
Answer: 

True 

(c)  V t has  size  kxn- 
Answer: 

False 
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Supplementary  Exercises 


i. 


Find  an  Z,  [/-decomposition  of  A = 
Answer: 


-6  2 
6 0 


1 

C\J> 

0 

1 

CO 

1 

1 

1 

Cs] 

1 

l 

0 2_ 

2.  Find  the  LD  [/-decomposition  of  the  matrix^  in  Exercise  1. 


3. 


Find  an  L {/-decomposition  of  A = 


Answer: 


2 4 6 
1 4 7 
1 3 7 


'2 

0 

O' 

'1 

2 

3' 

1 

2 

0 

0 

1 

2 

1 

1 

2 

0 

0 

1 

4.  Find  the  LD  [/-decomposition  of  the  matrix  A in  Exercise  3. 


5. 


Let  A = 


2 1 
1 2 


and  xq  = 


(a)  Identify  the  dominant  eigenvalue  of  A and  then  find  the  corresponding  dominant  unit  eigenvector  v 
with  positive  entries. 

(b)  Apply  the  power  method  with  Euclidean  scaling  to  A and  xq,  stopping  at  xy  Compare  your  value  of 
X5  to  the  eigenvector  y found  in  part  (a). 

(c)  Apply  the  power  method  with  maximum  entry  scaling  to  A and  xq,  stopping  at  xy  Compare  your 

T 


result  with  the  eigenvector 


1 


Answer: 

(a) 


\ = 3,  v = 


f2 

f2 


(b) 


(c) 


x5 


x5: 


0.7100 

0.7041 

1 

0.9918 


0.7071 

0.7071 


6.  Consider  the  symmetric  matrix 


Discuss  the  behavior  of  the  power  sequence 

XO,  XI,--,  x*. ... 

with  Euclidean  scaling  for  a general  nonzero  vector  xq  What  is  it  about  the  matrix  that  causes  the 
observed  behavior? 

7.  Suppose  that  a symmetric  matrix  A has  distinct  eigenvalues  Aj  = 8,  A2  = 1 .4,  A3  = 2. 3,  and  A4  = —8.1. 
What  can  you  say  about  the  convergence  of  the  Rayleigh  quotients? 

8 Til 

' Find  a singular  value  decomposition  of  A = | ^ ^ . 

1 r 

0 0 . 

1 1 


9. 

Find  a singular  value  decomposition  of  A = 


Answer: 


0 

4^ 

1 

1 

1 

f2 

"2  o' 

1 

f2 

1 

f2 

0 1 

—7=  0 

0 

1 

0 0 
0 0 

1 

~f2 

1 

f2 

10.  Find  a reduced  singular  value  decomposition  and  a reduced  singular  value  expansion  of  the  matrix  A in 
Exercise  9. 


11.  Find  the  reduced  singular  value  decomposition  of  the  matrix  whose  singular  value  decomposition  is 


A = 


1 

2 

1 

2 

1 

'2 

1 

2 


1 

2 

1 

'2 

1 

2 

1 

'2 


24  0 0 

0 12  0 

0 0 0 

0 0 0 


2 

3 

2 

3 

1 

3 


1 

'3 

2 

3 

2 

3 


Answer: 


* M 

I 

OO  O 

s 

1 

1 1 
2 2 

'24 

0 ' 

'2  1 2' 

3 3 3 

4 -8  10 

1 1 

0 

12 

2 2 1 

1 

O 

C\1 

1 

2 2 

1 1 

L.  _l 

3 3 3 

12.  Do  orthogonally  similar  matrices  have  the  same  singular  values?  Justify  your  answer. 

13.  If  P is  the  standard  matrix  for  the  orthogonal  projection  of  Rn  onto  a subspace  W \ what  can  you  say  about 
the  singular  values  of  PI 
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Applications  of  Linear 
Algebra 
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INTRODUCTION 


This  chapter  consists  of  20  applications  of  linear  algebra.  With  one  clearly  marked 


exception,  each  application  is  in  its  own  independent  section,  so  sections  can  be  deleted  or 
permuted  as  desired.  Each  topic  begins  with  a list  of  linear  algebra  prerequisites. 

Because  our  primary  objective  in  this  chapter  is  to  present  applications  of  linear  algebra, 
proofs  are  often  omitted.  Whenever  results  from  other  fields  are  needed,  they  are  stated 
precisely,  with  motivation  where  possible,  but  usually  without  proof. 
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10.1  Constructing  Curves  and  Surfaces  Through 
Specified  Points 

In  this  section  we  describe  a technique  that  uses  determinants  to  construct  lines,  circles,  and  general  conic 
sections  through  specified  points  in  the  plane.  The  procedure  is  also  used  to  pass  planes  and  spheres  in  3-space 
through  fixed  points. 


Prerequisites 

Linear  Systems 
Determinants 
Analytic  Geometry 


The  following  theorem  follows  from  Theorem  2.3.8. 


THEOREM  10.1.1 

A homogeneous  linear  system  with  as  many  equations  as  unknowns  has  a nontrivial  solution  if  and  only 
if  the  determinant  of  the  coefficient  matrix  is  zero. 


We  will  now  show  how  this  result  can  be  used  to  determine  equations  of  various  curves  and  surfaces  through 
specified  points. 


A Line  Through  Two  Points 

Suppose  that  (x\,y\)  and  (*2,  y-i)  are  two  distinct  points  in  the  plane.  There  exists  a unique  line 

c\x+C2y  +C3  = 0 (1) 

that  passes  through  these  two  points  (Figure  10.1.1).  Note  that  c\9  c 2,  and  ^3  are  not  all  zero  and  that  these 
coefficients  are  unique  only  up  to  a multiplicative  constant.  Because  (x\,y\)  and  (*2?  y 2)  he  on  the  line, 
substituting  them  in  1 gives  the  two  equations 

c\x\  +C2T1  +C3  = 0 (2) 


c\X2  + C2y2  + c3  = ^ 


(3) 


X 


-► 


Figure  10.1.1 


The  three  equations,  1,  2,  and  3,  can  be  grouped  together  and  rewritten  as 

xci+yc2  + C2  = 0 

x\c\  +y\C2  + c2  = o 

X2C\  + >,2C2  + c2  = 0 


which  is  a homogeneous  linear  system  of  three  equations  for  ci,  c 2,  and  c 3.  Because  ci,  C2,  and  C3  are  not  all 
zero,  this  system  has  a nontrivial  solution,  so  the  determinant  of  the  coefficient  matrix  of  the  system  must  be 
zero.  That  is, 

x y 1 
*1  /I  1 

*2  y2  1 

Consequently,  every  point  (x,  y ) on  the  line  satisfies  4;  conversely,  it  can  be  shown  that  every  point  (*,  y ) that 
satisfies  4 lies  on  the  line. 


= 0 


(4) 


EXAMPLE  1 Equation  of  a Line 

Find  the  equation  of  the  line  that  passes  through  the  two  points  (2,  1)  and  (3,  7). 


Substituting  the  coordinates  of  the  two  points  into  Equation  4 gives 


x y 1 

2 1 1 

3 7 1 


= 0 


The  cofactor  expansion  of  this  determinant  along  the  first  row  then  gives 

—6x  +y  +11  = 0 


A Circle  Through  Three  Points 

Suppose  that  there  are  three  distinct  points  in  the  plane,  (x\,yi),  (*2  J?)’  and  (*3, 73) > not  all  lying  on  a 
straight  line.  From  analytic  geometry  we  know  that  there  is  a unique  circle,  say, 

2 2 

ciO  +J  ) +C2X + cr^y + C4  = 0 


(5) 


that  passes  through  them  (Figure  10.1.2).  Substituting  the  coordinates  of  the  three  points  into  this  equation  gives 


ci(*i  +7i)  +^2*1  +C4  = 0 


(6) 


cl(x2  +c2*2  + C3T2  +^4=  0 


(7) 


cl(x2  +73)  + C2X3  + C3T3  + C4  = 0 


(8) 


As  before,  Equations  5 through  8 form  a homogeneous  linear  system  with  a nontrivial  solution  for  c\9  C2,  ^3, 
and  C4.  Thus  the  determinant  of  the  coefficient  matrix  is  zero: 

x2+y 2 x y 1 


xj+yj  *1  y 1 1 

x2  +y%  x2  yi  1 

xj+yj  *2  ^3  1 


This  is  a determinant  form  for  the  equation  of  the  circle. 


= 0 


(9) 


EXAMPLE  2 Equation  of  a Circle 

Find  the  equation  of  the  circle  that  passes  through  the  three  points  (1,7),  (6,  2),  and  (4,  6). 
Substituting  the  coordinates  of  the  three  points  into  Equation  9 gives 


2 . 2 
x +7 

X 

y 

1 

50 

1 

7 

1 

40 

6 

2 

1 

52 

4 

6 

1 

which  reduces  to 

10(x2  +y2)  - 20x  -40y  - 200  = 0 

(x-1)2+0-2)2  = 52 


In  standard  form  this  is 


Thus  the  circle  has  center  (1,2)  and  radius  5. 


A General  Conic  Section  Through  Five  Points 

In  his  momumental  work  Principia  Mathematica , Issac  Newton  posed  and  solved  the  following  problem  (Book 
I,  Proposition  22,  Problem  14):  “To  describe  a conic  that  shall  pass  through  five  given  points.”  Newton  solved 
this  problem  geometrically,  as  shown  in  Figure  10.1.3,  in  which  he  passed  an  ellipse  through  the  points  A,  B,  D, 
P,  C;  however,  the  methods  of  this  section  can  also  be  applied. 


c: 


The  general  equation  of  a conic  section  in  the  plane  (a  parabola,  hyperbola,  or  ellipse,  or  degenerate  forms  of 
these  curves)  is  given  by 

2 2 

c\x  +C2xy  +ciy  +c^x  + c$y  +c$  = 0 

This  equation  contains  six  coefficients,  but  we  can  reduce  the  number  to  five  if  we  divide  through  by  any  one  of 
them  that  is  not  zero.  Thus  only  five  coefficients  must  be  determined,  so  five  distinct  points  in  the  plane  are 
sufficient  to  determine  the  equation  of  the  conic  section  (Figure  10.1.4).  As  before,  the  equation  can  be  put  in 
determinant  form  (see  Exercise  7): 

* xy  yz  x 

xj  *\y\  yj  xi 

x\  XW2  y\  X2 

xj  *373  y 3 x3 

2 2 
*4  *4X4  74  x4 

xj  xsy5  yj  x5 


y 1 
yi  i 
yi  i 

73  1 

74  i 

75  1 


= 0 


(10) 


>' 


(«|.>|) 


(*2’  >':) 

U3O.O 

('5- >5) 
(*4>  >4) 


EXAMPLE  3 


Figure  10.1.4 

Equation  of  an  Orbit 


An  astronomer  who  wants  to  determine  the  orbit  of  an  asteroid  about  the  Sun  sets  up  a Cartesian 
coordinate  system  in  the  plane  of  the  orbit  with  the  Sun  at  the  origin.  Astronomical  units  of 
measurement  are  used  along  the  axes  (1  astronomical  unit  = mean  distance  of  Earth  to  Sun  = 93 
million  miles).  By  Kepler's  first  law,  the  orbit  must  be  an  ellipse,  so  the  astronomer  makes  five 
observations  of  the  asteroid  at  five  different  times  and  finds  five  points  along  the  orbit  to  be 
(8.025,8.310),  (10.170,6.355),  (11.202,3.212),  (10.736,0.375),  (9.092,  -2.267) 

Find  the  equation  of  the  orbit. 

Substituting  the  coordinates  of  the  five  given  points  into  10  and  rounding  to  three 
decimal  places  give 


= 0 


x2 

72 

X 

y 

1 

64.401 

66.688 

69.056 

8.025 

8.310 

1 

103.429 

64.630 

40.386 

10.170 

6.355 

1 

125.485 

35.981 

10.317 

11.202 

3.212 

1 

115.262 

4.026 

0.141 

10.736 

0.375 

1 

82.664 

-20.612 

5.139 

9.092 

-2.267 

1 

The  cofactor  expansion  of  this  determinant  along  the  first  row  yields 

386.802x2  - 102.895;cy  + 446. 029y 2 - 2476.443*  - 1427.9987  - 17109.375  = 0 
Figure  10.1.5  is  an  accurate  diagram  of  the  orbit,  together  with  the  five  given  points. 


10 

8 

6 

4 

2 

0 

-2 


Sun 


(8.025,8.310) 
(10.170,  6.355) 

(11.202,3.212) 
(10.736, 0,375) 

(9.092.  -2.267) 


-6-4-2  0 2 4 6 8 10  12  14  16  18  20  22 
Figure  10.1.5 


A Plane  Through  Three  Points 


In  Exercise  8 we  ask  you  to  show  the  following:  The  plane  in  3-space  with  equation 

c\x  +C2y  +C3Z  + €4  = 0 

that  passes  through  three  noncollinear  points  (x  1 , y \ , z\ ) , (*2, z2  ) 5 anc^  (*3,73,  Z3)  is  given  by  the 
determinant  equation 

x y z 1 

*1  y\  z\  i 

X2  yi  *2  l 

*3  >>3  z3  1 


= 0 


(ii) 


EXAMPLE  4 Equation  of  a Plane 

The  equation  of  the  plane  that  passes  through  the  three  noncollinear  points  (1,1,0), (2,0, 
and  (2,  9,  2)  is 

x y z 1 

11  0 1 

2 0-11 
2 9 2 1 


-1), 


= 0 


which  reduces  to 


2x  — y + 3z  — 1 = 0 


A Sphere  Through  Four  Points 


In  Exercise  9 we  ask  you  to  show  the  following:  The  sphere  in  3-space  with  equation 

2 2 2 

ci(*  +7  +z  ) +C2X +C2y + C4Z  + cs  = 0 

that  passes  through  four  noncoplanar  points  (x\,y\,z\),  (x2,  y2,  z2)>  (*3>  T3,  Z3),  and  (x4,  y4,  z4)  is  given 
by  the  following  determinant  equation: 


*2 

+y2 

+ z2 

X 

y 

z 

1 

A 

+y? 

+z  1 

*1 

y 1 

z\ 

1 

A 

+yj 

+ A 

x2 

y 2 

z2 

1 

A 

+y2 

+ A 

X3 

y 3 

z3 

1 

A 

+y4 

+ zj 

x4 

y 4 

ZA 

1 

= 0 


(12) 


EXAMPLE  5 Equation  of  a Sphere 


The  equation  of  the  sphere  that  passes  through  the  four  points  (0,  3,  2),  (1,  — 1,  1),  (2,  1,  0), 
and  (5,  1,  3)  is 


This  reduces  to 


2.2.  2 

x +7  + z x y 

13  0 3 

3 1 -1 

5 2 1 

35  5 1 


z 1 

2 1 
1 1 
0 1 
3 1 


x2  +y 2 +z2  - Ax  - 2y  - 6z  + 5 = 0 


which  in  standard  form  is 

(x-2)2  + 0— l)2  + (z  — 3)2  = 9 


Exercise  Set  10.1 


1.  Find  the  equations  of  the  lines  that  pass  through  the  following  points: 

(a)  (1.  -1).(2.2) 

(b)  (0,1),  (1,  -1) 


Answer: 

(a)  y = 3x  — 4 

(b)  y = — 2x  + 1 

2.  Find  the  equations  of  the  circles  that  pass  through  the  following  points: 

(a)  (2,  6),  (2,  0),  (5,  3) 

(b)  (2,  -2),  (3,  5),  (-4,  6) 

Answer: 

(a)  x2  4*y2  — 4x  — 6y  + 4 = 0 or  (x  — 2)2  + (y  — 3)2  = 9 

(b)  x2  + y 2 4-  2x  -4y  - 20  = 0 or  (*  + l)2  + (y  - 2)2  = 25 

3.  Find  the  equation  of  the  conic  section  that  passes  through  the  points  (0,  0),  (0,  — 1),  (2,  0),  (2,  — 5),  and 

(4,-1). 

Answer: 

x2  + 2 xy  +y2  — 2x  +7  = 0 (a  parabola) 

4.  Find  the  equations  of  the  planes  in  3-space  that  pass  through  the  following  points: 

(a)  (1,1,  -3),  (1,  -1,1),  (0,  -1,2) 


(b)  (2.3.1),  (2,  -1,-1),  (1,2,1) 


Answer: 

(a)  x + 2y  + z = 0 

(b)  —x  + y — 2z  + 1 = 0 

(a)  Alter  Equation  1 1 so  that  it  determines  the  plane  that  passes  through  the  origin  and  is  parallel  to  the  plane 
that  passes  through  three  specified  noncollinear  points. 

(b)  Find  the  two  planes  described  in  part  (a)  corresponding  to  the  triplets  of  points  in  Exercises  4(a)  and  4(b). 
Answer: 

(a)  x y z 0 

*i  y i z\  i 
x2  y2  22  1 

*3  y3  z3  1 

(b)  x 4-  2y  + z = 0;  —x  +y  — 2z  = 0 

6.  Find  the  equations  of  the  spheres  in  3-space  that  pass  through  the  following  points: 

(a)  (1,2,3),  (-1,2,  1),  (1,  0,1),  (1,2,  -1) 

(b)  (0,1,  -2),  (1,3,1),  (2,  -1,0),  (3,1,  -1) 

Answer: 

(a)  x2  -\-  y2  +z2  — 2x  — 4y  — 2z  = — 2 or  (x  — l)2  + (y  — 2)2  + (z—  l)2  = 4 

(b)  x2  +y 2 +z2  - 2x  - 2^  = 3 or  (x  — l)2  + (y  — l)2  +z2  = 5 

7.  Show  that  Equation  10  is  the  equation  of  the  conic  section  that  passes  through  five  given  distinct  points  in  the 
plane. 

8.  Show  that  Equation  11  is  the  equation  of  the  plane  in  3-space  that  passes  through  three  given  noncollinear 
points. 

9.  Show  that  Equation  12  is  the  equation  of  the  sphere  in  3 -space  that  passes  through  four  given  noncoplanar 
points. 

10.  Find  a determinant  equation  for  the  parabola  of  the  form 

2 

c\y+C2X  +C2X  +^4=0 

that  passes  through  three  given  noncollinear  points  in  the  plane. 


Answer: 


y x2  x 1 

y 1 *?  *1  i 

2 =° 
yi  *2  *2  i 

73  *3  *3  1 

11.  What  does  Equation  9 become  if  the  three  distinct  points  are  collinear? 

Answer: 

The  equation  of  the  line  through  the  three  collinear  points 

12.  What  does  Equation  1 1 become  if  the  three  distinct  points  are  collinear? 

Answer: 

0 = 0 

13.  What  does  Equation  12  become  if  the  four  points  are  coplanar? 

Answer: 

The  equation  of  the  plane  through  the  four  coplanar  points 

Technology  Exercises 

The  following  exercises  are  designed  to  be  solved  using  a technology  utility.  Typically,  this  will  be  MATLAB, 
Mathematical  Maple,  Derive,  or  Mathcad,  but  it  may  also  be  some  other  type  of  linear  algebra  software  or  a 
scientific  calculator  with  some  linear  algebra  capabilities.  For  each  exercise  you  will  need  to  read  the  relevant 
documentation  for  the  particular  utility  you  are  using.  The  goal  of  these  exercises  is  to  provide  you  with  a basic 
proficiency  with  your  technology  utility.  Once  you  have  mastered  the  techniques  in  these  exercises,  you  will  be 
able  to  use  your  technology  utility  to  solve  many  of  the  problems  in  the  regular  exercise  sets. 

Tl.  The  general  equation  of  a quadric  surface  is  given  by 

2 2 2 

a\x  + a^y  + a^z  + a$xy  4-  a^xz+a^yz  + ajx  + a%y  + agz  + a io  = 0 

Given  nine  points  on  this  surface,  it  may  be  possible  to  determine  its  equation. 

(a)  Show  that  if  the  nine  points  y2)  for  i = 1,  2,  3, 9 lie  on  this  surface,  and  if  they  determine  uniquely 
the  equation  of  this  surface,  then  its  equation  can  be  written  in  determinant  form  as 


2 2 2 

x y z xy  xz  yz  x y z 1 

*1  y\  z\  *171  *i*i  y\z\  *i  71  zi  1 

*2  y\  z2  *2>>2  X2Z2  y2z2  x2  72  z2  1 

*3  73  z3  *373  *3^3  73^3  *3  73  *3  1 

x4  y\  z4  *4X4  *4?4  74^4  *4  74  z4  1 

*5  75  25  x*y5  X5Z5  75Z5  x5  y5  z5  1 

*6  76  z6  *676  *6^6  76*6  *6  76  *6  1 

*7  7?  z7  *777  X7Z7  77^7  *7  77  Z7  1 

*8  7g  z8  *878  *8*8  78*8  *8  78  *8  1 

Xg  yg  Zg  xgyg  xgzg  ygzg  xg  yg  zg  1 

(b)  Use  the  result  in  part  (a)  to  determine  the  equation  of  the  quadric  surface  that  passes  through  the  points 
(1.  2,  3),  (2,  1. 7),  (0,  4,  6),  (3,  - 1,  4),  (3,  0,  1 1),  ( - 1,  5,  8),  (9,  - 8,  3),  (4,  5,  3),  and 
(-2,  6,10). 

T2. 

(a)  A hyperplane  in  the  ^-dimensional  Euclidean  space  Rn  has  an  equation  of  the  form 

fll*l  +«2t2*2+tf3*3+  ' ’ • +<*nxn+an+ 1 = 0 

where  «j,  i = 1,  2,  3, ....  n + 1,  are  constants,  not  all  zero,  and  *i,  i = 1,  2,  3,  • • • , n,  are  variables  for 
which 

{x\,X2,X3 xn)&Rn 

A point 

(*10.  *20.  *30.— .XrffieR” 

lies  on  this  hyperplane  if 

a 1*10 + <22*20+^3*  30  + ‘ ‘ • +tfMxM0  + tfw+l  = ° 

Given  that  the  n points  (x\j,  *2i>  x2i>  --*>  xm) ? * = 1,  2,  3, n,  lie  on  this  hyperplane  and  that  they 
uniquely  determine  the  equation  of  the  hyperplane,  show  that  the  equation  of  the  hyperplane  can  be  written 
in  determinant  form  as 

XI  x2  X3  • • • 1 

All  *21  *31  ' - ' *nl  1 

*12  *22  *32  • ’ • *m2  1 _q 

*13  *23  *33  * • • *m3  1 

*lw  *2 m *3 n ‘ ‘ ‘ *mm  1 

(b)  Determine  the  equation  of  the  hyperplane  in  R?  that  goes  through  the  following  nine  points: 


(1,2,  3, 4,  5,  6, 7,  8,  9)  (2,  3, 4,  5, 
(3, 4,  5,  6, 7,  8,  9,  1,2)  (4,  5,  6,  7, 
(5,  6, 7,  8,  9,  1,2,  3, 4)  (6, 7,  8,  9, 
(7,8,9,  1,2,  3, 4,  5,  6)  (8,9,  1,2, 
(9,  1,2,  3,  4,  5,  6,7,  8) 
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, 7,  8,  9,1) 
:,  9, 1,2,  3) 
,2,  3,4,5) 
,4,  5,  6,7) 


10.2  Geometric  Linear  Programming 

In  this  section  we  describe  a geometric  technique  for  maximizing  or  minimizing  a linear  expression  in  two 
variables  subject  to  a set  of  linear  constraints. 


Prerequisites 

Linear  Systems 
Linear  Inequalities 


Linear  Programming 


The  study  of  linear  programming  theory  has  expanded  greatly  since  the  pioneering  work  of  George  Dantzig  in 
the  late  1940s.  Today,  linear  programming  is  applied  to  a wide  variety  of  problems  in  industry  and  science.  In 
this  section  we  present  a geometric  approach  to  the  solution  of  simple  linear  programming  problems.  Let  us 
begin  with  some  examples. 

EXAMPLE  1 Maximizing  Sales  Revenue 


A candy  manufacturer  has  130  pounds  of  chocolate-covered  cherries  and  170  pounds  of 
chocolate-covered  mints  in  stock.  He  decides  to  sell  them  in  the  form  of  two  different  mixtures. 
One  mixture  will  contain  half  cherries  and  half  mints  by  weight  and  will  sell  for  $2.00  per 
pound.  The  other  mixture  will  contain  one-third  cherries  and  two-thirds  mints  by  weight  and 
will  sell  for  $1.25  per  pound.  How  many  pounds  of  each  mixture  should  the  candy 
manufacturer  prepare  in  order  to  maximize  his  sales  revenue? 


Let  the  mixture  of  half  cherries  and  half  mints  be  called  mix  A, 
and  let  x \ be  the  number  of  pounds  of  this  mixture  to  be  prepared.  Let  the  mixture  of  one-third 
cherries  and  two-thirds  mints  be  called  mix  B,  and  let  *2  be  the  number  of  pounds  of  this 
mixture  to  be  prepared.  Since  mix  A sells  for  $2.00  per  pound  and  mix  B sells  for  $ 1 .25  per 
pound,  the  total  sales  z (in  dollars)  will  be 

z = 2.00xi  + 1-25x2 


Since  each  pound  of  mix  A contains  y pound  of  cherries  and  each  pound  of  mix  B contains  y 
pound  of  cherries,  the  total  number  of  pounds  of  cherries  used  in  both  mixtures  is 


2*1  + 3*2 

Similarly,  since  each  pound  of  mix  A contains  7-  pound  of  mints  and  each  pound  of  mix  B 

o 

contains  — pound  of  mints,  the  total  number  of  pounds  of  mints  used  in  both  mixtures  is 


Because  the  manufacturer  can  use  at  most  130  pounds  of  cherries  and  170  pounds  of  mints,  we 
must  have 


Furthermore,  since  x \ and  x 2 cannot  be  negative  numbers,  we  must  have 

xi>0  and  x2>0 

The  problem  can  therefore  be  formulated  mathematically  as  follows:  Find  values  of  xi  and  *2 
that  maximize 


Later  in  this  section  we  will  show  how  to  solve  this  type  of  mathematical  problem 
geometrically. 


EXAMPLE  2 Maximizing  Annual  Yield 

A woman  has  up  to  $10,000  to  invest.  Her  broker  suggests  investing  in  two  bonds,  A and  B. 
Bond  A is  a rather  risky  bond  with  an  annual  yield  of  10%,  and  bond  B is  a rather  safe  bond 
with  an  annual  yield  of  7%.  After  some  consideration,  she  decides  to  invest  at  most  $6000  in 
bond  A,  to  invest  at  least  $2000  in  bond  B,  and  to  invest  at  least  as  much  in  bond  A as  in  bond 
B.  How  should  she  invest  her  money  in  order  to  maximize  her  annual  yield? 

Let  x 1 be  the  number  of  dollars  to  be  invested  in  bond  A,  and 
let  x 2 be  the  number  of  dollars  to  be  invested  in  bond  B.  Since  each  dollar  invested  in  bond  A 
earns  $.10  per  year  and  each  dollar  invested  in  bond  B earns  $.07  per  year,  the  total  dollar 
amount  z earned  each  year  by  both  bonds  is 


^*1  + jx2<  130 
■1*1  + -|x2<  170 


z = 2.00xi  + 1-25x2 


subject  to 


^•X!+jX2  <130 
^xi  + |x2  <170 


xi  > 0 

X2  >0 


lOxi + -07x2 


The  constraints  imposed  can  be  formulated  mathematically  as  follows: 


Invest  no  more  than  $ 10,000: 

Invest  at  most  $ 6000  in  bond  A: 

Invest  at  least  $ 2000  in  bond  B. 

Invest  at  least  as  much  in  bond  A as  in  bond  B: 


xi  + x2  <10,  000 


xi  <6000 

x2  > 2000 
*1  >*2 


We  also  have  the  implicit  assumption  that  x 1 and  x2  are  nonnegative: 


xj>0  and  X2>0 

Thus  the  complete  mathematical  formulation  of  the  problem  is  as  follows:  Find  values  of  x \ 
and  X2  that  maximize 

z=  lOxi  + -07*2 

subject  to 


XI  +x2 

<10,  000 

*1 

<6000 

*2 

>2000 

*1  “*2 

>0 

*1 

>0 

*2 

>0 

EXAMPLE  3 Minimizing  Cost 


A student  desires  to  design  a breakfast  of  cornflakes  and  milk  that  is  as  economical  as  possible. 
On  the  basis  of  what  he  eats  during  his  other  meals,  he  decides  that  his  breakfast  should  supply 
him  with  at  least  9 grams  of  protein,  at  least  4 the  recommended  daily  allowance  (RDA)  of 

vitamin  D,  and  at  least  ^ the  RDA  of  calcium.  He  finds  the  following  nutrition  and  cost 
information  on  the  milk  and  cornflakes  containers: 


Milk 
(5 cup) 

Cornflakes 

(1  ounce) 

Cost 

7.5  cents 

5.0  cents 

Protein 

4 grams 

2 grains 

Vitamin  I) 

^ of  RDA 

1^5  of  RDA 

Calcium 

5 of  RDA 

None 

In  order  not  to  have  his  mixture  too  soggy  or  too  dry,  the  student  decides  to  limit  himself  to 
mixtures  that  contain  1 to  3 ounces  of  cornflakes  per  cup  of  milk,  inclusive.  What  quantities  of 
milk  and  cornflakes  should  he  use  to  minimize  the  cost  of  his  breakfast? 


Let  x i be  the  quantity  of  milk  used  (measured  in  -i-cup  units). 

La 

and  let  *2  he  the  quantity  of  cornflakes  used  (measured  in  1 -ounce  units).  Then  if  z is  the  cost 
of  the  breakfast  in  cents,  we  may  write  the  following. 


Cost  of  breakfast: 

At  least  9 grams  protein: 

At  least  j KDA  vitamin  D: 

At  least  -7  RDA  calcium: 

At  least  1 ounce  cornflakes 
per  cup  ^two  — cups  jof  milk: 

At  most  3 ounces  cornflakes 
per  cup  (two  L — cups  Jof  milk: 


z = 7.5*i  + 5.0*2 
4*i  + 2*2  > 9 

8*1  + To"*2- J 


f|>^(or*i  -2*2<0) 


f|<|(or  3*i  -2*2  >0) 


As  before,  we  also  have  the  implicit  assumption  that  x\  > 0 and  x2 
mathematical  formulation  of  the  problem  is  as  follows:  Find  values 

subject  to 


z = 7.5*1  + 5 

0*2 

4*i  4-  2*2 

>9 

8*1+  W*2 

IV 

h 

l^r 

Al 

*1  - 2*2 

<0 

3*1  - 2*2 

>0 

*i 

>0 

*2 

>0 

> 0.  Thus  the  complete 
of  x 1 and  x2  that  minimize 


Geometric  Solution  of  Linear  Programming  Problems 

Each  of  the  preceding  three  examples  is  a special  case  of  the  following  problem. 


Problem 

Find  values  of  x 1 and  *2  that  either  maximize  or  minimize 

z = cixi  + c2x  2 (1) 


subject  to 


and 


<311*1 

+ 

<312*2 

(<)(>)(  = ) 

£1 

<321*  1 

+ 

<*22*2 

(<)(>)(  = ) 

h 

(2) 

<3ml*l 

+ 

<3  m2*  2 

(<)(>)(  = ) 

bm 

*1  >0, 

* 

to 

IV 

0 

(3) 

In  each  of  the  m conditions  of  2,  any  one  of  the  symbols  < . > . and  — may  be  used. 

The  problem  above  is  called  the  general  linear  programming  problem  in  two  variables.  The  linear  function  z 
in  1 is  called  the  objective  function.  Equations  2 and  3 are  called  the  constraints',  in  particular,  the  equations 
in  3 are  called  the  nonnegativity  constraints  on  the  variables  x [ and  *2- 

We  will  now  show  how  to  solve  a linear  programming  problem  in  two  variables  graphically.  A pair  of  values 
(*1,  X2)  that  satisfy  all  of  the  constraints  is  called  a feasible  solution.  The  set  of  all  feasible  solutions 
determines  a subset  of  the  x i*2-plane  called  the  feasible  region.  Our  desire  is  to  find  a feasible  solution  that 
maximizes  the  objective  function.  Such  a solution  is  called  an  optimal  solution. 


To  examine  the  feasible  region  of  a linear  programming  problem,  let  us  note  that  each  constraint  of  the  form 

anxi+ai2x2  = bi 

defines  a line  in  the  x iX2-plane,  whereas  each  constraint  of  the  form 

aj\xi+ai2X2<bi  or  <3ii*l  + <3i2*2 
defines  a half-plane  that  includes  its  boundary  line 

^1*1  + <3i2*2  =i>i 

Thus  the  feasible  region  is  always  an  intersection  of  finitely  many  lines  and  half-planes.  For  example,  the  four 
constraints 

^xi+^X2  <130 
■^x\  + ^X2  <170 

xi  >0 
X2  >0 

of  Example  1 define  the  half-planes  illustrated  in  parts  (a),  (b),  (c),  and  (cl)  of  Figure  10.2.1.  The  feasible 
region  of  this  problem  is  thus  the  intersection  of  these  four  half-planes,  which  is  illustrated  in  Figure  10.2.  le. 
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*2 

255 


5*1  + 5*2  — 1™ 


>0 


.240  Jc, 


(6) 


(C) 


i 1*2 

*2>0 


(d) 

Figure  10.2.1 
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1 (0. 255) 


(180. 120) 


(0.0) 


(260.  0) 


(e) 


It  can  be  shown  that  the  feasible  region  of  a linear  programming  problem  has  a boundary  consisting  of  a finite 
number  of  straight  line  segments.  If  the  feasible  region  can  be  enclosed  in  a sufficiently  large  circle,  it  is 
called  bounded  (Figure  10.2.  le);  otherwise,  it  is  called  unbounded  (see  Figure  10.2.5).  If  the  feasible  region 
is  empty  (contains  no  points),  then  the  constraints  are  inconsistent  and  the  linear  programming  problem  has  no 
solution  (see  Figure  10.2.6). 

Those  boundary  points  of  a feasible  region  that  are  intersections  of  two  of  the  straight  line  boundary  segments 
are  called  extreme  points.  (They  are  also  called  corner  points  and  vertex  points .)  For  example,  in  Figure 
10.2.  le,  we  see  that  the  feasible  region  of  Example  1 has  four  extreme  points: 

(0,0),  (0,255),  (180,120),  (260,0)  (4) 


The  importance  of  the  extreme  points  of  a feasible  region  is  shown  by  the  following  theorem. 


Maximum  and  Minimum  Values 

If  the  feasible  region  of  a linear  programming  problem  is  nonempty  and  bounded,  then  the  objective 
function  attains  both  a maximum  and  a minimum  value,  and  these  occur  at  extreme  points  of  the 
feasible  region.  If  the  feasible  region  is  unbounded,  then  the  objective  function  may  or  may  not  attain 
a maximum  or  minimum  value;  however,  if  it  attains  a maximum  or  minimum  value,  it  does  so  at  an 
extreme  point. 


Figure  10.2.2  suggests  the  idea  behind  the  proof  of  this  theorem.  Since  the  objective  function 

z = c\x\  +C2*2 

of  a linear  programming  problem  is  a linear  function  of  * \ and  *2,  its  level  curves  (the  curves  along  which  z 
has  constant  values)  are  straight  lines.  As  we  move  in  a direction  perpendicular  to  these  level  curves,  the 
objective  function  either  increases  or  decreases  monotonically.  Within  a bounded  feasible  region,  the 
maximum  and  minimum  values  of  z must  therefore  occur  at  extreme  points,  as  Figure  10.2.2  indicates. 


In  the  next  few  examples  we  use  Theorem  10.2.1  to  solve  several  linear  programming  problems  and  illustrate 
the  variations  in  the  nature  of  the  solutions  that  may  occur. 

EXAMPLE  4 Example  1 Revisited 

Figure  10.2.  le  shows  that  the  feasible  region  of  Example  1 is  bounded.  Consequently,  from 
Theorem  10.2.1  the  objective  function 

z = 2.00;q  + 1.25*2 

attains  both  its  minimum  and  maximum  values  at  extreme  points.  The  four  extreme  points  and 
the  corresponding  values  of  z are  given  in  the  following  table. 


F.xtreine  Point 

Value  of 

Z = 2.00*!  + 1.2fvr2 

(0,0) 

0 

(0.  255) 

318.75 

(180,  120) 

510.00 

(260,  0) 

520.00 

We  see  that  the  largest  value  ofz  is  520.00  and  the  corresponding  optimal  solution  is  (260,  0). 
Thus  the  candy  manufacturer  attains  maximum  sales  of  $520  when  he  produces  260  pounds  of 
mixture  A and  none  of  mixture  B. 


EXAMPLES  Using  Theorem  10.2.1 


Find  values  of  x \ and  ^2  that  maximize 

z = x i + 3^2 


subject  to 


2xi + 3x2  < 24 

X-1-X2  < 7 

X2  < 6 

xi  > 0 

X2  > 0 


In  Figure  10.2.3  we  have  drawn  the  feasible  region  of  this  problem.  Since  it  is 
bounded,  the  maximum  value  of  z is  attained  at  one  of  the  five  extreme  points.  The  values  of 
the  objective  function  at  the  five  extreme  points  are  given  in  the  following  table. 


- lxl  + 3.t:  = 24 
’(0.6)  (3,6) 

*2  = 6 

*|  -.*2  = 7 


j 1 1 1 1 L 


(0,  0) 


(9,  2) 

• I I I L 

(7*0) 


Figure  10.2.3 
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Extreme  Point 
(x„  x2) 

Value  of 
z - X|  + 3x2 

(0,  6) 

18 

(3.  6) 

21 

(9,  2) 

15 

(7.0) 

7 

(0,0) 

0 

From  this  table,  the  maximum  value  of  z is  21,  which  is  attained  at  x i = 3 and  X2  = 6. 


EXAMPLE  6 Using  Theorem  10.2.1 


Find  values  of  x \ and  *2  that  maximize 


subject  to 


z = 4x\+  6x2 


2xi  + 3*2 

< 

24 

*1  “*2 

< 

7 

*2 

< 

6 

*1 

> 

0 

*2 

> 

0 

The  constraints  in  this  problem  are  identical  to  the  constraints  in  Example  5,  so  the 
feasible  region  of  this  problem  is  also  given  by  Figure  10.2.3.  The  values  of  the  objective 
function  at  the  extreme  points  are  given  in  the  following  table. 


Extreme  I'oint 
<*,,  *2> 

Value  of 
z = 4*!  + 6x2 

(0. 6) 

36 

(3.6) 

48 

(9, 2) 

48 

(7.0) 

28 

(0,0) 

0 

We  see  that  the  objective  function  attains  a maximum  value  of  48  at  two  adjacent  extreme 
points,  (3,  6)  and  (9,  2).  This  shows  that  an  optimal  solution  to  a linear  programming  problem 
need  not  be  unique.  As  we  ask  you  to  show  in  Exercise  10,  if  the  objective  function  has  the 
same  value  at  two  adjacent  extreme  points,  it  has  the  same  value  at  all  points  on  the  straight  line 
boundary  segment  connecting  the  two  extreme  points.  Thus,  in  this  example  the  maximum 
value  ofz  is  attained  at  all  points  on  the  straight  line  segment  connecting  the  extreme  points 
(3,  6)  and  (9,  2). 


EXAMPLE  7 The  Feasible  Region  Is  a Line  Segment 


Find  values  of  xj  and  *2  that  minimize 


subject  to 


z = 2x\ 

— x2 

2x\  4=  3x2 

= 

12 

2xi  - 3x2 

> 

0 

*1 

> 

0 

*2 

> 

0 

In  Figure  10.2.4  we  have  drawn  the  feasible  region  of  this  problem.  Because  one  of 
the  constraints  is  an  equality  constraint,  the  feasible  region  is  a straight  line  segment  with  two 
extreme  points.  The  values  of  z at  the  two  extreme  points  are  given  in  the  following  table. 


Figure  10.2.4 


Extreme  Point 
(Jf  i»  x2) 

Value  of 

z = 2jt, -x2 

(3,  2) 
(6,  0) 

4 

12 

The  minimum  value  of  z is  thus  4 and  is  attained  atx\  = 3 and  xj  = 2. 


EXAMPLE  8 Using  Theorem  10.2.1 

Find  values  of  xi  and  *2  that  maximize 

z = 2x\  + 5^2 


subject  to 

2xi  +*2  > 8 

— 4xi+X2  < 2 

2xi  — 3x2  ^ 0 

xi  > 0 

*2  > 0 


The  feasible  region  of  this  linear  programming  problem  is  illustrated  in  Figure 
10.2.5.  Since  it  is  unbounded,  we  are  not  assured  by  Theorem  10.2.1  that  the  objective  function 
attains  a maximum  value.  In  fact,  it  is  easily  seen  that  since  the  feasible  region  contains  points 
for  which  both  x \ and  X2  are  arbitrarily  large  and  positive,  the  objective  function 

z = 2x\  + 5x2 

can  be  made  arbitrarily  large  and  positive.  This  problem  has  no  optimal  solution.  Instead,  we 
say  the  problem  has  an  unbounded  solution. 


n 


- (1.6) 


-4  v,  + x2  = 2 


It.  - It,  = 0 


(3,2) 

Ztj  + .tj  = 8 


J L 


Figure  10.2.5 


EXAMPLE  9 Using  Theorem  10.2.1 

Find  values  of  * 1 and  *2  that  maximize 
subject  to 


z = — 

+ *2 

2x\  +x2 

> 

-Axi  +X2 

< 

2x\  -3x2 

< 

x\ 

> 

*2 

> 

The  above  constraints  are  the  same  as  those  in  Example  8,  so  the  feasible  region  of 
this  problem  is  also  given  by  Figure  10.2.5.  In  Exercise  11  we  ask  you  to  show  that  the 
objective  function  of  this  problem  attains  a maximum  within  the  feasible  region.  By  Theorem 
10.2.1,  this  maximum  must  be  attained  at  an  extreme  point.  The  values  of  z at  the  two  extreme 
points  of  the  feasible  region  are  given  in  the  following  table. 


Extreme  Point 

Value  of 

(x,.  x2 ) 

Z = -5*i  + *2 

(1.6) 

1 

(3,  2) 

-13 

The  maximum  value  of  z is  thus  1 and  is  attained  at  the  extreme  point  x \ = 1 , X2  = 6. 


EXAMPLE  10  Inconsistent  Constraints 


Find  values  of  x \ and  *2  that  minimize 


subject  to 


z — 3x\  — 87:2 

2x\  — X2  < 4 

3^1  + 11^2  < 33 
3xi  + 4x2  ^ 24 

xi  > 0 

X2  > 0 


As  can  be  seen  from  Figure  10.2.6,  the  intersection  of  the  five  half-planes  defined 
by  the  five  constraints  is  empty.  This  linear  programming  problem  has  no  feasible  solutions 
since  the  constraints  are  inconsistent. 


There  are  no  points  common  to  all  five  shaded  half-planes. 


Exercise  Set  10.2 


1.  Find  values  of  xj  and  X2  that  maximize 
subject  to 


z = 3x\  -F  2x2 


2xi  + 3*2 

< 

6 

2xi ~ x2 

> 

0 

*1 

< 

2 

x2 

< 

1 

*1 

> 

0 

x2 

> 

0 

Answer: 


2 22 

x i = 2,  X2  = y;  maximum  value  of  z = 


2.  Find  values  of  and  x2  that  minimize 
subject  to 


z = 3x\ 

-5x2 

2xi  -*2 

< - 

4xi  -x2 

> 

*2 

< 

*1 

> 

*2 

> 

Answer: 


No  feasible  solutions 
3.  Find  values  of  x \ and  x2  that  minimize 

subject  to 


z=  — 3xi  + 

2x2 

3xi  -x2 

> 

-5 

-xi  +x2 

> 

1 

2xi  +4x2 

> 

12 

*1 

> 

0 

x2 

> 

0 

Answer: 

Unbounded  solution 

4.  Solve  the  linear  programming  problem  posed  in  Example  2. 
Answer: 


Invest  $6000  in  bond  A and  $4000  in  bond  B;  the  annual  yield  is  $880. 

5.  Solve  the  linear  programming  problem  posed  in  Example  3. 


Answer: 


7 25  335 

-jr  cup  of  milk,  rHr  ounces  of  com  flakes;  minimum  cost  = “ 

9 18  18 


18.6& 


6.  In  Example  5 the  constraint  x\  — 7:2  < 7 is  said  to  be  nonbinding  because  it  can  be  removed  from  the 
problem  without  affecting  the  solution.  Likewise,  the  constraint  7:2  < 6 is  said  to  be  binding  because 
removing  it  will  change  the  solution. 


(a)  Which  of  the  remaining  constraints  are  nonbinding  and  which  are  binding? 

(b)  For  what  values  of  the  right-hand  side  of  the  nonbinding  constraint  x \ — ^2  < 7 will  this  constraint 
become  binding?  For  what  values  will  the  resulting  feasible  set  be  empty? 

(c)  For  what  values  of  the  right-hand  side  of  the  binding  constraints  X2  < 6 will  this  constraint  become 
nonbinding?  For  what  values  will  the  resulting  feasible  set  be  empty? 


Answer: 


(a)  *1  > 0 and  *2  > 0 are  nonbinding;  2x\  + 3x2  < 24  is  binding 

(b)  x \ — X2  < v for  v < — 3 is  binding  and  for  y < — 6 yields  the  empty  set. 

(c)  x 2 < v for  v < 8 is  nonbinding  and  for  v < 0 yields  the  empty  set. 

7.  A trucking  firm  ships  the  containers  of  two  companies,  A and  B.  Each  container  from  company  A weighs 
40  pounds  and  is  2 cubic  feet  in  volume.  Each  container  from  company  B weighs  50  pounds  and  is  3 cubic 
feet  in  volume.  The  trucking  firm  charges  company  A $2.20  for  each  container  shipped  and  charges 
company  B $3.00  for  each  container  shipped.  If  one  of  the  firm's  trucks  cannot  carry  more  than  37,000 
pounds  and  cannot  hold  more  than  2000  cubic  feet,  how  many  containers  from  companies  A and  B should 
a truck  carry  to  maximize  the  shipping  charges? 

Answer: 

550  containers  from  company  A and  300  containers  from  company  B;  maximum  shipping 
charges  = $2110 

8.  Repeat  Exercise  7 if  the  trucking  firm  raises  its  price  for  shipping  a container  from  company  A to  $2.50. 

Answer: 

925  containers  from  company  A and  no  containers  from  company  B;  maximum  shipping 
charges  = $2312.50 

9.  A manufacturer  produces  sacks  of  chicken  feed  from  two  ingredients,  A and  B.  Each  sack  is  to  contain  at 
least  10  ounces  of  nutrient  N\,  at  least  8 ounces  of  nutrient  and  at  least  12  ounces  of  nutrient  A/3. 
Each  pound  of  ingredient  A contains  2 ounces  of  nutrient  N 1 , 2 ounces  of  nutrient  Nj-  and  6 ounces  of 
nutrient  ii/3.  Each  pound  of  ingredient  B contains  5 ounces  of  nutrient  N\,  3 ounces  of  nutrient  N2,  and  4 
ounces  of  nutrient  N 3.  If  ingredient  A costs  8 cents  per  pound  and  ingredient  B costs  9 cents  per  pound, 
how  much  of  each  ingredient  should  the  manufacturer  use  in  each  sack  of  feed  to  minimize  his  costs? 

Answer: 

0.4  pound  of  ingredient  A and  2.4  pounds  of  ingredient  B;  minimum  cost  = 24. 8 & 

10.  If  the  objective  function  of  a linear  programming  problem  has  the  same  value  at  two  adjacent  extreme 
points,  show  that  it  has  the  same  value  at  all  points  on  the  straight  line  segment  connecting  the  two 
extreme  points.  [Hint:  If  (z  j , x'-, ) and  (x”,  x2)  are  any  two  points  in  the  plane,  a point  (zj,  Z2)  lies  on 
the  straight  line  segment  connecting  them  if 

x\  =tx  [ + (1  — t)x" 
and 

X2  = tx'2  + (1  -t)x" 

where  t is  a number  in  the  interval  [0,  1 ] .] 

11.  Show  that  the  objective  function  in  Example  9 attains  a maximum  value  in  the  feasible  set.  [Hint: 

Examine  the  level  curves  of  the  objective  function.] 


Technology  Exercises 


The  following  exercises  are  designed  to  be  solved  using  a technology  utility.  Typically,  this  will  be  MATLAB, 
Mathematica , Maple,  Derive,  or  Mathcad,  but  it  may  also  be  some  other  type  of  linear  algebra  software  or  a 
scientific  calculator  with  some  linear  algebra  capabilities.  For  each  exercise  you  will  need  to  read  the 
relevant  documentation  for  the  particular  utility  you  are  using.  The  goal  of  these  exercises  is  to  provide  you 
with  a basic  proficiency  with  your  technology  utility.  Once  you  have  mastered  the  techniques  in  these 
exercises,  you  will  be  able  to  use  your  technology  utility  to  solve  many  of  the  problems  in  the  regular 
exercise  sets. 

Tl.  Consider  the  feasible  region  consisting  of  0 < x,  0 <y  along  with  the  set  of  inequalities 


for  k = 0,  1,  2 1 . Maximize  the  objective  function 

z = 3x  + Ay 

assuming  that  (a) « = 1,  (b)  n = 2,  (c) » = 3,  (d) « = 4,  (e)  n = 5,  (f)  n = 6,  (g)  n = 1,  (h)  « = 8,  (0  n = 9, 
(j)  n = 10,  and  (k)  ^ . (1)  Next,  maximize  this  objective  function  using  the  nonlinear  feasible  region, 

0 <x,  0 <y,  and 


(m)  Let  the  results  of  parts  (a)  through  (k)  begin  a sequence  of  values  for  ^max-  Do  these  values  approach  the 
value  determined  in  part  (1)?  Explain. 

T2.  Repeat  Exercise  Tl  using  the  objective  function  z = x + y. 
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10.3  The  Earliest  Applications  of  Linear  Algebra 

Linear  systems  can  be  found  in  the  earliest  writings  of  many  ancient  civilizations.  In  this  section  we  give 
some  examples  of  the  types  of  problems  that  they  used  to  solve. 


Prerequisites 

Linear  Systems 


The  practical  problems  of  early  civilizations  included  the  measurement  of  land,  the  distribution  of  goods,  the 
tracking  of  resources  such  as  wheat  and  cattle,  and  taxation  and  inheritance  calculations.  In  many  cases,  these 
problems  led  to  linear  systems  of  equations  since  linearity  is  one  of  the  simplest  relationships  that  can  exist 
among  variables.  In  this  section  we  present  examples  from  five  diverse  ancient  cultures  illustrating  how  they 
used  and  solved  systems  of  linear  equations.  We  restrict  ourselves  to  examples  before  a.d.  500.  These 
examples  consequently  predate  the  development  of  the  field  of  algebra  by  Islamic/ Arab  mathematicians,  a 
field  that  ultimately  led  in  the  nineteenth  century  to  the  branch  of  mathematics  now  called  linear  algebra. 

EXAMPLE  1 Egypt  (about  1650  B.c.) 


Lfl*T7TffT  ’i~ 


Problem  40  of  the  Ahmes  Papyrus 


The  Ahmes  (or  Rhind)  Papyrus  is  the  source  of  most  of  our  information  about  ancient  Egyptian 
mathematics.  This  5-meter-long  papyrus  contains  84  short  mathematical  problems,  together 
with  their  solutions,  and  dates  from  about  1650  B.C.  Problem  40  in  this  papyrus  is  the  following: 


Divide  100  hekats  of  barley  among  five  men  in  arithmetic  progression  so  that  the  sum  of 
the  two  smallest  is  one-seventh  the  sum  of  the  three  largest. 


Let  a be  the  least  amount  that  any  man  obtains,  and  let  d be  the  common  difference  of  the  terms 
in  the  arithmetic  progression.  Then  the  other  four  men  receive  a \ d,  a | 2d,  a \ 3d,  and 
a | Ad  hekats.  The  two  conditions  of  the  problem  require  that 

cl  + (<3  + d ) + (a  + 2 d)  + (<s  + 3d)  + (ct  + Ad)  = 100 

y [ (a  + 2d)  + (a  4-  3d)  + (a  + Ad)  ] = a + (a  4-  d) 

These  equations  reduce  to  the  following  system  of  two  equations  in  two  unknowns: 


5 a + lCka? 

11a—  2d 


100 

0 


(1) 


The  solution  technique  described  in  the  papyrus  is  known  as  the  method  of  false  position  or 
false  assumption.  It  begins  by  assuming  some  convenient  value  of  a (in  our  case  a = ] ), 
substituting  that  value  into  the  second  equation,  and  obtaining  ^=11/2-  Substituting  a=  1 
and  £^  = 11/2  int0  the  left-hand  side  of  the  first  equation  gives  60,  whereas  the  right-hand  side 
is  100.  Adjusting  the  initial  guess  for  a by  multiplying  it  by  100  / 60  leads  to  the  correct  value 
a = 5 / 3-  Substituting  a = 5/3  into  the  second  equation  then  gives  d = 55  / 6,  so  the 
quantities  of  barley  received  by  the  five  men  are  10  / 6 , 65/6,  120  / 6,  175  / 6,  and  230  / 6 
hekats.  This  technique  of  guessing  a value  of  an  unknown  and  later  adjusting  it  has  been  used 
by  many  cultures  throughout  the  ages. 


EXAMPLE  2 


Babylonia  (1900-1600  b.c.) 


Babylonian  clay  tablet  Ca  MLA  1950 


The  Old  Babylonian  Empire  flourished  in  Mesopotamia  between  1900  and  1600  B.C.  Many  clay 
tablets  containing  mathematical  tables  and  problems  survive  from  that  period,  one  of  which 
(designated  Ca  MLA  1950)  contains  the  next  problem.  The  statement  of  the  problem  is  a bit 
muddled  because  of  the  condition  of  the  tablet,  but  the  diagram  and  the  solution  on  the  tablet 
indicate  that  the  problem  is  as  follows: 


A trapezoid  with  an  area  of 320  square  units  is  cut  off  from  a right  triangle  by  a line 
parallel  to  one  of  its  sides.  The  other  side  has  length  50  units , and  the  height  of  the 
trapezoid  is  20  units.  What  are  the  upper  and  the  lower  widths  of  the  trapezoid? 


Let  x be  the  lower  width  of  the  trapezoid  and  y its  upper  width.  The  area  of  the  trapezoid  is  its 
height  times  its  average  width,  so  20 X ^ ^ j = 320.  Using  similar  triangles,  we  also  have 
x v 

— = -f-y.  The  solution  on  the  tablet  uses  these  relations  to  generate  the  linear  system 

i(*+.y)  = 16 

! < 

Adding  and  subtracting  these  two  equations  then  gives  the  solution  x = 20  and  y = 12- 


EXAMPLE  3 China  (a.d.  263) 


◄ 


Chiu  Chang  Suan  Shu  in  Chinese  characters 

The  most  important  treatise  in  the  history  of  Chinese  mathematics  is  the  Chiu  Chang  Suan  Shu, 
or  “The  Nine  Chapters  of  the  Mathematical  Art.”  This  treatise,  which  is  a collection  of  246 
problems  and  their  solutions,  was  assembled  in  its  final  form  by  Liu  Hui  in  A.D.  263.  Its 
contents,  however,  go  back  to  at  least  the  beginning  of  the  Han  dynasty  in  the  second  century 
B.C.  The  eighth  of  its  nine  chapters,  entitled  “The  Way  of  Calculating  by  Arrays,”  contains  18 
word  problems  that  lead  to  linear  systems  in  three  to  six  unknowns.  The  general  solution 
procedure  described  is  almost  identical  to  the  Gaussian  elimination  technique  developed  in 


Europe  in  the  nineteenth  century  by  Carl  Friedrich  Gauss.  The  first  problem  in  the  eighth 
chapter  is  the  following: 


There  are  three  classes  of  corn , of  which  three  bundles  of  the  first  class,  two  of  the 
second,  and  one  of  the  third  make  39  measures.  Two  of  the  first,  three  of  the  second,  and 
one  of  the  third  make  34  measures.  And  one  of  the  first,  two  of  the  second,  and  three  of 
the  third  make  26  measures.  How  many  measures  of  grain  are  contained  in  one  bundle 
of  each  class? 


Let  x,  y,  and  z be  the  measures  of  the  first,  second,  and  third  classes  of  com.  Then  the 
conditions  of  the  problem  lead  to  the  following  linear  system  of  three  equations  in  three 


unknowns: 

3x  + 2 y+z 
2x  + 3 y+z 
x + 2 y + 3 z 


= 39 

= 34  (3) 

= 26 


The  solution  described  in  the  treatise  represented  the  coefficients  of  each  equation  by  an 
appropriate  number  of  rods  placed  within  squares  on  a counting  table.  Positive  coefficients 
were  represented  by  black  rods,  negative  coefficients  were  represented  by  red  rods,  and  the 
squares  corresponding  to  zero  coefficients  were  left  empty.  The  counting  table  was  laid  out  as 
follows  so  that  the  coefficients  of  each  equation  appear  in  columns  with  the  first  equation  in  the 
rightmost  column: 


1 

21 

3 

2 

3 

2 

3 

1 

1 

26 

34 

39 

Next,  the  numbers  of  rods  within  the  squares  were  adjusted  to  accomplish  the  following  two 
steps:  (1)  two  times  the  numbers  of  the  third  column  were  subtracted  from  three  times  the 
numbers  in  the  second  column  and  (2)  the  numbers  in  the  third  column  were  subtracted  from 
three  times  the  numbers  in  the  first  column.  The  result  was  the  following  array: 


3 

4 

5 

2 

8 

1 

1 

39 

24 

39 

In  this  array,  four  times  the  numbers  in  the  second  column  were  subtracted  from  five  times  the 
numbers  in  the  first  column,  yielding 


3 

5 

2 

36 

1 

_ 1 1 

99 

24 

39 

This  last  array  is  equivalent  to  the  linear  system 


3x  + 2 y+z  = 39 
5 y+z  = 24 
36z  = 99 


This  triangular  system  was  solved  by  a method  equivalent  to  back  substitution  to  obtain 

x = 37  / 4>  y = 17  / 4>  and  z = 1 1 / 4- 


EXAMPLE  4 Greece  (third  century  B.c.) 


Perhaps  the  most  famous  system  of  linear  equations  from  antiquity  is  the  one  associated  with 
the  first  part  of  Archimedes'  celebrated  Cattle  Problem.  This  problem  supposedly  was  posed  by 
Archimedes  as  a challenge  to  his  colleague  Eratosthenes.  No  solution  has  come  down  to  us 
from  ancient  times,  so  that  it  is  not  known  how,  or  even  whether,  either  of  these  two  geometers 
solved  it. 

If  thou  art  diligent  and  wise,  O stranger,  compute  the  number  of  cattle  of  the  Sun,  who 
once  upon  a time  grazed  on  the  fields  of  the  Thrinacian  isle  of  Sicily,  divided  into  four 
herds  of  different  colors,  one  milk  white,  another  glossy  black,  a third  yellow,  and  the 
last  dappled.  In  each  herd  were  bulls,  mighty  in  number  according  to  these  proportions: 
Understand,  stranger,  that  the  white  bulls  were  equal  to  a half  and  a third  of  the  black 
together  with  the  whole  of  the  yellow,  while  the  black  were  equal  to  the  fourth  part  of 
the  dappled  and  a fifth,  together  with,  once  more,  the  whole  of  the  yellow.  Observe 
further  that  the  remaining  bulls,  the  dappled,  were  equal  to  a sixth  part  of  the  white  and 
a seventh,  together  with  all  of  the  yellow.  These  were  the  proportions  of  the  cows:  The 
white  were  precisely  equal  to  the  third  part  and  a fourth  of  the  whole  herd  of  the  black; 
while  the  black  were  equal  to  the  fourth  part  once  more  of  the  dappled  and  with  it  a 


Archimedes  c.  287-212  B.C. 


fifth  part , when  all , including  the  bulls,  went  to  pasture  together  Now  the  dappled  in 
four  parts  were  equal  in  number  to  a fifth  part  and  a sixth  of  the  yellow  herd.  Finally 
the  yellow  were  in  number  equal  to  a sixth  part  and  a seventh  of  the  white  herd.  If  thou 
canst  accurately  tell,  O stranger,  the  number  of  cattle  of  the  Sun,  giving  separately  the 
number  of  well-fed  bulls  and  again  the  number  of females  according  to  each  color,  thou 
wouldst  not  be  called  unskilled  or  ignorant  of  numbers,  but  not  yet  shalt  thou  be 
numbered  among  the  wise. 


The  conventional  designation  of  the  eight  variables  in  this  problem  is 


w 

— 

number  of  white  bulls 

B 

= 

number  of  black  bulls 

Y 

= 

number  of  yellow  bulls 

D 

= 

number  of  dappled  bulls 

w 

= 

number  of  white  cows 

b 

= 

number  of  black  cows 

y 

= 

number  of  yellow  cows 

d 

= 

number  of  dappled  cows 

The  problem  can  now  be  stated  as  the  following  seven  homogeneous  equations  in  eight 
unknowns: 


1.  W=fi  + VjB+y 

2 ■ 's=(i+5)fl+1' 

3-  £3=|i  + IjfF+7 

4 W = ("3  + 4)^"*“^ 

5-  6=(i  + i)(0  + rf) 
0.  i-(|  + f)<r+7) 
7-  y=4  + iW+w) 


(The  white  bulls  were  equal  to  a half  and  a third  of  the 
black  [bulls]  together  with  the  whole  of  the  yellow 
[bulls].) 

(The  black  [bulls]  were  equal  to  the  fourth  part  of  the 
dappled  [bulls]  and  a fifth,  together  with,  once  more,  the 
whole  of  the  yellow  [bulls].) 

(The  remaining  bulls,  the  dappled,  were  equal  to  a sixth 
part  of  the  white  [bulls]  and  a seventh,  together  with  all 
of  the  yellow  [bulls].) 

(The  white  [cows]  were  precisely  equal  to  the  third  part 
and  a fourth  of  the  whole  herd  of  the  black.) 

(The  black  [cows]  were  equal  to  the  fourth  part  once 
more  of  the  dappled  and  with  it  a fifth  part,  when  all, 
including  the  bulls,  went  to  pasture  together.) 

(The  dappled  [cows]  in  four  parts  [that  is,  in  totality] 
were  equal  in  number  to  a fifth  part  and  a sixth  of  the 
yellow  herd.) 

(The  yellow  [cows]  were  in  number  equal  to  a sixth  part 
and  a seventh  of  the  white  herd.) 


As  we  ask  you  to  show  in  the  exercises,  this  system  has  infinitely  many  solutions  of  the  form 


w 

= 

10,  366,482* 

B 

= 

7,460,514* 

Y 

= 

4,  149,  387* 

D 

= 

7,  358,  060* 

w 

= 

7,  206,  360* 

b 

= 

4,  893,  246* 

y 

= 

5,439,213* 

d 

= 

3,515,820* 

where  k is  any  real  number.  The  values  Ar  = 1,  2, give  infinitely  many  positive  integer 

solutions  to  the  problem,  with  £ = 1 giving  the  smallest  solution. 


EXAMPLE  5 India  (fourth  century  a.d.) 


Fragment  III-5-3v  of  the  Bakhshali  Manuscript 

The  Bakhshali  Manuscript  is  an  ancient  work  of  Indian/Hindu  mathematics  dating  from  around 
the  fourth  century  A.D.,  although  some  of  its  materials  undoubtedly  come  from  many  centuries 
before.  It  consists  of  about  70  leaves  or  sheets  of  birch  bark  containing  mathematical  problems 
and  their  solutions.  Many  of  its  problems  are  so-called  equalization  problems  that  lead  to 
systems  of  linear  equations.  One  such  problem  on  the  fragment  shown  is  the  following: 


One  merchant  has  seven  asava  horses,  a second  has  nine  haya  horses,  and  a third  has 
ten  camels.  They  are  equally  well  off  in  the  value  of  their  animals  if  each  gives  two 
animals,  one  to  each  of  the  others.  Find  the  price  of  each  animal  and  the  total  value  of 
the  animals  possessed  by  each  merchant. 


Let  x be  the  price  of  an  asava  horse,  let  y be  the  price  of  a haya  horse,  let  z be  the  price  of  a 
camel,  and  the  let  K be  the  total  value  of  the  animals  possessed  by  each  merchant.  Then  the 
conditions  of  the  problem  lead  to  the  following  system  of  equations: 

5x  + y + z = K 

x + ly  + z = K (5) 

x + y 4-  8z  = K 

The  method  of  solution  described  in  the  manuscript  begins  by  subtracting  the  quantity 


(x  4.  y 4.  z)  from  both  sides  of  the  three  equations  to  obtain  4x  = 6y  = lz  = K — (x  + y +z) 
. This  shows  that  if  the  prices  x,  y,  and  z are  to  be  integers,  then  the  quantity  K — (x  | y \ z) 
must  be  an  integer  that  is  divisible  by  4,  6,  and  7.  The  manuscript  takes  the  product  of  these 
three  numbers,  or  1 68,  for  the  value  of  K — (*  | y \ z),  which  yields  x =42,  y = 28,  and 
z = 24  for  the  prices  and  = 262  f°r  the  total  value.  (See  Exercise  6 for  more  solutions  to  this 
problem.) 


Exercise  Set  10.3 

1.  The  following  lines  from  Book  12  of  Homer's  Odyssey  relate  a precursor  of  Archimedes'  Cattle  Problem: 

Thou  shalt  ascend  the  isle  triangular, 

Where  many  oxen  of  the  Sun  are  fed, 

And  fatted  flocks.  Of  oxen  fifty  head 
In  every  herd  feed,  and  their  herds  are  seven; 

And  of  his  fat  flocks  is  their  number  even. 

The  last  line  means  that  there  are  as  many  sheep  in  all  the  flocks  as  there  are  oxen  in  all  the  herds.  What  is 
the  total  number  of  oxen  and  sheep  that  belong  to  the  god  of  the  Sun?  (This  was  a difficult  problem  in 
Homer's  day.) 

Answer: 

700 

2.  Solve  the  following  problems  from  the  Bakhshali  Manuscript. 

(a)  B possesses  two  times  as  much  as  A;  C has  three  times  as  much  as  A and  B together;  D has  four  times 
as  much  as  A,  B,  and  C together.  Their  total  possessions  are  300.  What  is  the  possession  of  A? 

(b)  B gives  2 times  as  much  as  A;  C gives  3 times  as  much  as  B;  D gives  4 times  as  much  as  C.  Their  total 
gift  is  132.  What  is  the  gift  of  A? 

Answer: 

(a)  5 

(b)  4 

3.  A problem  on  a Babylonian  tablet  requires  finding  the  length  and  width  of  a rectangle  given  that  the  length 
and  the  width  add  up  to  10,  while  the  length  and  one-fourth  of  the  width  add  up  to  7.  The  solution 
provided  on  the  tablet  consists  of  the  following  four  statements: 


Multiply  7 by  4 to  obtain  28. 


Take  away  10  from  28  to  obtain  18. 

Take  one-third  of  1 8 to  obtain  6,  the  length. 

Take  away  6 from  1 0 to  obtain  4,  the  width. 

Explain  how  these  steps  lead  to  the  answer. 

4.  The  following  two  problems  are  from  “The  Nine  Chapters  of  the  Mathematical  Art.”  Solve  them  using  the 

array  technique  described  in  Example  3. 

(a)  Five  oxen  and  two  sheep  are  worth  10  units  and  two  oxen  and  five  sheep  are  worth  8 units.  What  is  the 
value  of  each  ox  and  sheep? 

(b)  There  are  three  kinds  of  com.  The  grains  contained  in  two,  three,  and  four  bundles,  respectively,  of 
these  three  classes  of  com,  are  not  sufficient  to  make  a whole  measure.  However,  if  we  added  to  them 
one  bundle  of  the  second,  third,  and  first  classes,  respectively,  then  the  grains  would  become  on  full 
measure  in  each  case.  How  many  measures  of  grain  does  each  bundle  of  the  different  classes  contain? 


Answer: 


(a)  Ox,  -r-7-  units;  sheep,  -77-  unit 

21  F 21 

(b)  First  kind,  measure;  second  kind,  7^-  measure;  third  kind,  7^=-  measure 

5.  This  problem  in  part  (a)  is  known  as  the  “Flower  of  Thymaridas,”  named  after  a Pythagorean  of  the  fourth 
century  B.C. 

(a)  Given  the  n numbers  a 1,  aj, ....  an,  solve  for  x \ , xj,  ...,xn  in  the  following  linear  system: 

*T  +X2  + • • • + x„  = a 1 
*1  +*2  = <32 

X1+X3  = <23 


xi  + x„  = an 

(b)  Identify  a problem  in  this  exercise  set  that  fits  the  pattern  in  part  (a),  and  solve  it  using  your  general 
solution. 


Answer: 


M x to-t-as  + .-.  + O-ai  ,;,=a,-*i,i  = 2.3 » 

n — 2 

(b)  Exercise  7(b);  gold,  30-^  minae;  brass,  9^  minae;  tin,  14^  minae;  iron,  5^  minae 

Lrf  £ La  La 


6.  For  Example  5 from  the  Bakhshali  Manuscript: 

(a)  Express  Equations  5 as  a homogeneous  linear  system  of  three  equations  in  four  unknowns  (x,  y,  z,  and 
K ) and  show  that  the  solution  set  has  one  arbitrary  parameter. 

(b)  Find  the  smallest  solution  for  which  all  four  variables  are  positive  integers. 


(c)  Show  that  the  solution  given  in  Example  5 is  included  among  your  solutions. 


Answer: 


(a)  5x-\-y+z  — K = 0 

x + 7y  +z  — K = 0 

x q. y + 8 z — K = 0 


x — ^ j , y = -yjj-,  z — , K — t where  1 is  an  arbitrary  number 

(b)  Take  t = 131 5 so  that  ^ = 21,  y = 14,  z=  12,  £”  = 131  • 

(c)  Take  t = 262,  so  that  * = 42,  y = 28,  z = 24,  K = 262- 


7.  Solve  the  problems  posed  in  the  following  three  epigrams,  which  appear  in  a collection  entitled  “The 
Greek  Anthology,”  compiled  in  part  by  a scholar  named  Metrodorus  around  A.D.  500.  Some  of  its  46 
mathematical  problems  are  believed  to  date  as  far  back  as  600  B.C.  [Note:  Before  solving  parts  (a)  and  (c), 
you  will  have  to  formulate  the  question.] 

(a)  I desire  my  two  sons  to  receive  the  thousand  staters  of  which  I am  possessed,  but  let  the  fifth  part  of 
the  legitimate  one's  share  exceed  by  ten  the  fourth  part  of  what  falls  to  the  illegitimate  one. 

(b)  Make  me  a crown  weighing  sixty  minae,  mixing  gold  and  brass,  and  with  them  tin  and  much-wrought 
iron.  Let  the  gold  and  brass  together  form  two-thirds,  the  gold  and  tin  together  three-fourths,  and  the 
gold  and  iron  three-fifths.  Tell  me  how  much  gold  you  must  put  in,  how  much  brass,  how  much  tin, 
and  how  much  iron,  so  as  to  make  the  whole  crown  weigh  sixty  minae. 

(c)  First  person:  I have  what  the  second  has  and  the  third  of  what  the  third  has.  Second  person:  I have 
what  the  third  has  and  the  third  of  what  the  first  has.  Third  person:  And  I have  ten  minae  and  the  third 
of  what  the  second  has. 


The  following  exercises  are  designed  to  be  solved  using  a technology  utility.  Typically,  this  will  be  matlab, 
Mathematica,  Maple,  Derive,  or  Mathcad,  but  it  may  also  be  some  other  type  of  linear  algebra  software  or  a 
scientific  calculator  with  some  linear  algebra  capabilities.  For  each  exercise  you  will  need  to  read  the 
relevant  documentation  for  the  particular  utility  you  are  using.  The  goal  of  these  exercises  is  to  provide  you 
with  a basic  proficiency  with  your  technology  utility.  Once  you  have  mastered  the  techniques  in  these 
exercises,  you  will  be  able  to  use  your  technology  utility  to  solve  many  of  the  problems  in  the  regular 
exercise  sets. 


Answer: 


(c) 


(a) 


(b) 


Technology  Exercises 


Tl. 

(a)  Solve  Archimedes'  Cattle  Problem  using  a symbolic  algebra  program. 

(b)  The  Cattle  Problem  has  a second  part  in  which  two  additional  conditions  are  imposed.  The  first  of  these 
states  that  “When  the  white  bulls  mingled  their  number  with  the  black,  they  stood  firm,  equal  in  depth  and 
breadth.”  This  requires  that  W \ B be  a square  number,  that  is,  1,  4,  9,  16,  25,  and  so  on.  Show  that  this 
requires  that  the  values  of  k in  Eq.  4 be  restricted  as  follows: 

k = A,A56,lA9r2,  r=  1,2,3,... 

and  find  the  smallest  total  number  of  cattle  that  satisfies  this  second  condition. 

The  second  condition  imposed  in  the  second  part  of  the  Cattle  Problem  states  that  “When  the 
yellow  and  the  dappled  bulls  were  gathered  into  one  herd,  they  stood  in  such  a manner  that  their  number, 
beginning  from  one,  grew  slowly  greater  ’til  it  completed  a triangular  figure.”  This  requires  that  the  quantity 
7+  Z)  be  a triangular  number — that  is,  a number  of  the  form  1,1+  2,  1+2  + 3,  1 + 2 + 3 + 4, ....  This 
final  part  of  the  problem  was  not  completely  solved  until  1965  when  all  206,545  digits  of  the  smallest 
number  of  cattle  that  satisfies  this  condition  were  found  using  a computer. 

T2.  The  following  problem  is  from  “The  Nine  Chapters  of  the  Mathematical  Art”  and  determines  a 
homogeneous  linear  system  of  five  equations  in  six  unknowns.  Show  that  the  system  has  infinitely  many 
solutions,  and  find  the  one  for  which  the  depth  of  the  well  and  the  lengths  of  the  five  ropes  are  the  smallest 
possible  positive  integers. 

Suppose  that  five  families  share  a well.  Suppose  further  that 

2 of  A?s  ropes  are  short  of  the  well’s  depth  by  one  of  B’s  ropes. 

3 of  B’s  ropes  are  short  of  the  well’s  depth  by  one  of  C’s  ropes. 

4 of  C’s  ropes  are  short  of  the  well’s  depth  by  one  of  D's  ropes. 

5 of  D's  ropes  are  short  of  the  well's  depth  by  one  of  E’s  ropes. 

6 of  E's  ropes  are  short  of  the  well’s  depth  by  one  of  A's  ropes. 
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10.4  Cubic  Spline  Interpolation 

In  this  section  an  artist’s  drafting  aid  is  used  as  a physical  model  for  the  mathematical  problem  of  finding  a curve  that  passes 
through  specified  points  in  the  plane.  The  parameters  of  the  curve  are  determined  by  solving  a linear  system  of  equations. 


Prerequisites 

Linear  Systems 
Matrix  Algebra 
Differential  Calculus 


Curve  Fitting 

Fitting  a curve  through  specified  points  in  the  plane  is  a common  problem  encountered  in  analyzing  experimental  data,  in 
ascertaining  the  relations  among  variables,  and  in  design  work.  A ubiquitous  application  is  in  the  design  and  description  of 
computer  and  printer  fonts,  such  as  PostScript™  and  TrueType™  fonts  (Figure  10.4.1).  In  Figure  10.4.2  seven  points  in  the 
xy-plane  are  displayed,  and  in  Figure  10.4.4  a smooth  curve  has  been  drawn  that  passes  through  them.  A curve  that  passes 
through  a set  of  points  in  the  plane  is  said  to  interpolate  those  points,  and  the  curve  is  called  an  interpolating  curve  for  those 
points.  The  interpolating  curve  in  Figure  10.4.4  was  drawn  with  the  aid  of  a drafting  spline  (Figure  10.4.3).  This  drafting  aid 
consists  of  a thin,  flexible  strip  of  wood  or  other  material  that  is  bent  to  pass  through  the  points  to  be  interpolated.  Attached 
sliding  weights  hold  the  spline  in  position  while  the  artist  draws  the  interpolating  curve.  The  drafting  spline  will  serve  as  the 
physical  model  for  a mathematical  theory  of  interpolation  that  we  will  discuss  in  this  section. 


Figure  10.4.1 


A>’ 


X 


Figure  10.4.2 


Figure  10.4.3 


Figure  10.4.4 


Statement  of  the  Problem 

Suppose  that  we  are  given  n points  in  the  xy-plane, 

O1.71).  (*2,72),—, 

which  we  wish  to  interpolate  with  a “well-behaved”  curve  (Figure  10.4.5).  For  convenience,  we  take  the  points  to  be  equally 
spaced  in  the  x-direction,  although  our  results  can  easily  be  extended  to  the  case  of  unequally  spaced  points.  If  we  let  the 
common  distance  between  the  x-coordinates  of  the  points  be  h , then  we  have 

X2~*l  =*3-*2=  • • ' = xn~xn-  1=A 

Let  y = S(x),  x < xn  denote  the  interpolating  curve  that  we  seek.  We  assume  that  this  curve  describes  the  displacement  of 
a drafting  spline  that  interpolates  the  n points  when  the  weights  holding  down  the  spline  are  situated  precisely  at  the  n points.  It 
is  known  from  linear  beam  theory  that  for  small  displacements,  the  fourth  derivative  of  the  displacement  of  a beam  is  zero  along 
any  interval  of  the  x-axis  that  contains  no  external  forces  acting  on  the  beam.  If  we  treat  our  drafting  spline  as  a thin  beam  and 
realize  that  the  only  external  forces  acting  on  it  arise  from  the  weights  at  the  n specified  points,  then  it  follows  that 

s(iv)0)  = o (i) 


for  values  of  x lying  in  the  ^ — 1 open  intervals 

C x\>x2 ).  (x2>  x2) 


between  the  n points. 


We  also  need  the  result  from  linear  beam  theory  that  states  that  for  a beam  acted  upon  only  by  external  forces,  the  displacement 
must  have  two  continuous  derivatives.  In  the  case  of  the  interpolating  curve  y — S(x)  constructed  by  the  drafting  spline,  this 
means  that  S(x),S\x),  and  Sf,(x)  must  be  continuous  for  x\  < x < xn. 

The  condition  that  Srr(x)  be  continuous  is  what  causes  a drafting  spline  to  produce  a pleasing  curve,  as  it  results  in  continuous 


curvature.  The  eye  can  perceive  sudden  changes  in  curvature — that  is,  discontinuities  in  Sfi(x) — but  sudden  changes  in  higher 
derivatives  are  not  discernible.  Thus,  the  condition  that  Sr,(x)  be  continuous  is  the  minimal  prerequisite  for  the  interpolating 
curve  to  be  perceptible  as  a single  smooth  curve,  rather  than  as  a series  of  separate  curves  pieced  together. 


To  determine  the  mathematical  form  of  the  function  Sf(^),we  observe  that  because  S^v\x)  = 0 in  the  intervals  between  the  n 
specified  points,  it  follows  by  integrating  this  equation  four  times  that  S(x)  must  be  a cubic  polynomial  in  x in  each  such 
interval.  In  general,  however,  S{x)  will  be  a different  cubic  polynomial  in  each  interval,  so  S(x)  must  have  the  form 

1 S\(x ),  <x <*2 

CV/  \ , £>2(x),  X2<X<X2 

£(*)  = ( ^ i J (2) 

x„-\<x<x„ 

where  S\  ( x ),  S2 (x), ( x ) are  cubic  polynomials.  For  convenience,  we  will  write  these  in  the  form 

x\  <x  <X2 


3 2 

SlO)  = a\{x—  xi)  -¥b\{x  — xi)  +ci(x  — ^1) +^1, 
Sj{x)  = «2(^-^2)3  + ^2(^-^2)2+C2(^-^2)+^2. 


X2<X<X^ 


(3) 


Sy,-\{X)  = 13„_l(x-Xf)_l)3  + i„_l(x-XM_l)2+CM_l(x-X„_l)  < X < Xn 

The  a 2'’s,  ^2-’s,  Cj's,  and  dj's  constitute  a total  of  4«  — 4 coefficients  that  we  must  determine  to  specify  S(x)  completely.  If  we 
choose  these  coefficients  so  that  S(x)  interpolates  the  n specified  points  in  the  plane  and  S(x),  Sr (x),  and  S' ' (x)  are 
continuous,  then  the  resulting  interpolating  curve  is  called  a cubic  spline. 


Derivation  of  the  Formula  of  a Cubic  Spline 


From  Equations  2 and  3,  we  have 

S(x)  = £i(x)=<3i(x  — xi)3-F&i(x—  *i)2  + ci(x  — *i)  =Fd?i,  x\<x<X2 

S(x)  = S2(x)=ci2(x-X2)2  + b2(x-X2)2+C2(x—X2)+d2,  X2<x<X2 

S(x)  = S„-i(x)=a„-i(x-x„-i)3 + b„-i(x-xn-i)2 + c„-i(x-xn-i)+d„-u  x„-i<x<x„ 


(4) 


S'(x)  = Sj  (x)  = 3a\(x  — x\)2  + 2b\{x  — *i)  +ci, 

S'{x)  = SjOO  = 3,32(x-X2)3  + 2i2(^-^2)  +f2. 

S\x)  = Srri_l(x)  = 3an-i(x-x„-i)2  + 2bn-i(x-x„-i)+c„-i,  x„-i<x<x„ 

and 

S"(x)  = £f(*)  = 6a1(*-xi)  + 2ii, 

S"(x)  = Spx)  = 6a2(x-x2)  + 2b2, 

S"(x)  = S"_x(x)  = 6an-i(x-x»-i)  + 26„_i,  x„-i<x<x„ 

We  will  now  use  these  equations  and  the  four  properties  of  cubic  splines  stated  below  to  express  the  unknown  coefficients  flj,  &2- 
, c2?  dj,  i = 1,  2, n — 1,  in  terms  of  the  known  coordinates  y\,y2, 

1.  S(x)  interpolates  the  points  y j),  i = 1,  2, n. 


x\  <x  <X2 
X2  < x < x^ 


x\  <x  <X2 
X2  < x < X2 


Because  S(x)  interpolates  the  points  (xj,  7,*)  J = 1,  2, we  have 


£(*i)  =y\.  £(*2)  =y  2,  S(x»)  =yn 


(7) 


From  the  first  n _ ] of  these  equations  and  4,  we  obtain 

d\  = y i 

= 72 

1 = yn— 1 

From  the  last  equation  in  7,  the  last  equation  in  4,  and  the  fact  that  xn  — xn-\  = h,  we  obtain 

a„_iA3  + 6„_iA2+c„_iA  + ,i„_i  =>>„ 


(8) 


(9) 


2.  S(x)  is  continuous  on  [x\,  xn] . 

Because  S(x)  is  continuous  for  x\  < x < xn,  it  follows  that  at  each  point  in  the  set  *2>  *3,  *n-\  we  must  have 


j = 2,  3 «-l 


(10) 


Otherwise,  the  graphs  of  S2_i  (x)  and  Sj(x)  would  not  join  together  to  form  a continuous  curve  at  x2.  When  we  apply  the 

interpolating  property  Sj(Xj)  = yp  it  follows  from  10  that  = ypi  = 2,  3 — 1,  or  from  4 that 

3 2 

ajA  +C1&  4-(2?i  = y 2 

a2^3+i>2^2  + C2*+^2  = 73  ( 

2fo  °^^n—2h  + 2^  + 2 = 7n— 1 

3.  S'(x)  is  continuous  on  [x\,  x^]. 


Because  Sr  (x)  is  continuous  for  x\  < x < xn,  it  follows  that 

■^00=^00.  1 = 2,3 »-  1 

or,  from  5, 

3a\h?  4=  2b\k  +c\  = C2 

Sa^h*  + 2&2&  + C2  = c3 


3an-2^2 


4-  2bn-2h  + 2 


1 


4.  S^x)  w continuous  on  [x\,  X2] . 


Because  S^OO  is  continuous  for  < x < it  follows  that 

$"l  (*l)=  3"  (*»)> 

or,  from  6, 

6a\h  4-  2&i 

6<32^  4-  2&2 


i = 2,  3, » — 1 


= 2&2 
= 2&3 


2^  4-  2Z>m_2  — 


(12) 


(13) 


Equations  8,  9,  11,  12,  and  13  constitute  a system  of  4«  — 6 linear  equations  in  the  4«  — 4 unknown  coefficients  a2-,  &2?  c2,  ^2  , 
i = 1,  2, — 1.  Consequently,  we  need  two  more  equations  to  determine  these  coefficients  uniquely.  Before  obtaining  these 
additional  equations,  however,  we  can  simplify  our  existing  system  by  expressing  the  unknowns  flj,  fe2,  c2,  and  d 2 in  terms  of 


new  unknown  quantities 


Ml=S"(xi),  M2  = S"{x  2) = 


and  the  known  quantities 


>yn 


For  example,  from  6 it  follows  that 


M i = 2b  x 
M2  = 2b2 


1 — 2bn—\ 


so 


bx  = ±Mu  b2  = ±M2,..,  bn-\  = 


Moreover,  we  already  know  from  8 that 

*\=y\,  ‘*2=72,—,  dn-\  =y„-\ 

We  leave  it  as  an  exercise  for  you  to  derive  the  expressions  for  the  a2  ’s  and  c2  ’s  in  terms  of  the  Mj  s and  y 2 fs.  The  final  result  is 
as  follows: 

Cubic  Spline  Interpolation 

Given  n points  (X2>y2)>  --->  withxI+i  — Xj  = A,  i = 1,  2, n — 1,  the  cubic  spline 


for  j = 1,  2, n — 1,  where  M2  =£"(**),  1=1,2, «. 


From  this  result,  we  see  that  the  quantities  M\,  M 2, uniquely  determine  the  cubic  spline.  To  find  these  quantities,  we 
substitute  the  expressions  for  a2,  &2-,  and  c2  given  in  14  into  12.  After  some  algebraic  simplification,  we  obtain 


o 2 

*l)  Xl)  +{7l(x  — Xl)  + d\, 

S(x)  = la2(x  - x2)3  + b2(x  - x2)2  + C2(x  - x2)  +d2. 


*2  ^ x f?  *3 


that  interpolates  these  points  has  coefficients  given  by 


at  =(Mi+i-M2)/6A 
bt  = Mi  1 2 

Ci  = Oi+1  —yd  th-[  (M1+1  + 2 Mdh  1 6] 
<*i  = 7f 


(14) 


M1+4M2+M3  = 6O1  - 2y2  +73)  Ik2 

M2  + 4M3  + M4  = 6 2 — ^73  +74)  t h2 


(15) 


Mn— 2 + 4A/m_  1 4-  Mn  — 6(yf}—2  — 2yK—i±yn)fh 


or,  in  matrix  form, 


M\ 

’1 

4 

1 

0 . 

. 0 

0 

0 

o’ 

m2 

71  “272+73 

0 

1 

4 

1 . 

. 0 

0 

0 

0 

m3 

72-273+74 

0 

0 

1 

4 .. 

..  0 

0 

0 

0 

Ma 

6 

73-274  + 75 

0 

0 

0 

0 .. 

..  4 

1 

0 

0 

M»- 3 

h2 

7m-4“27m-3+7m-2 

0 

0 

0 

0 .. 

..  1 

4 

1 

0 

Myi—  2 

7m-3-27«-2+7m-1 

0 

0 

0 

0 .. 

..  0 

1 

4 

1 

M„_i 

7m— 2 — 27m— 1 +7m 

This  is  a linear  system  of  n — 2 equations  for  the  n unknowns  M\,  M„.  Thus,  we  still  need  two  additional  equations  to 

determine  M\,  M 2, Mn  uniquely.  The  reason  for  this  is  that  there  are  infinitely  many  cubic  splines  that  interpolate  the 
given  points,  so  we  simply  do  not  have  enough  conditions  to  determine  a unique  cubic  spline  passing  through  the  points.  We 
discuss  below  three  possible  ways  of  specifying  the  two  additional  conditions  required  to  obtain  a unique  cubic  spline  through 
the  points.  (The  exercises  present  two  more.)  They  are  summarized  in  Table  1. 

Table  1 


' ' 

Natural 

The  second 

M,  =0 

4 

1 

0 

...  0 

0 

0 

v,  - 2v2  + y3 

Spline 

derivative  of  the 

M„  = 0 

1 

4 

1 

...  o 

0 

0 

M, 

y2  - 2y,  + y4 

spline  is  zero  at  the 

6 

endpoints. 

0 

0 

0 

...  J 

4 

1 

K-2 

_.v»-2-2>v,  +y„ 

_0 

0 

0 

...  o 

1 

4 

K-i 

Parabolic 

The  spline  reduces 

M,  =Af, 

5 

1 

0 

...  0 

0 

0 

' m2  ' 

Vi  - 2y2  + y. 

Runout 

to  a parabolic  curve 

1 

4 

1 

...  o 

0 

0 

Af, 

v2  - 2v,  + y4 

Spline 

on  the  first  and  last 

6 

intervals. 

0 

0 

0 

...  ] 

4 

1 

Mn-2 

"a2 

_y«-2-2y„.i  +y„ 

0 

0 

0 

...  o 

1 

5 

-M"-1  . 

Cubic 

The  spline  is  a 

= 2A#,-Af, 

6 

0 

0 

...  o 

0 

0 

m2 

y,  - 2v2  + y3 

Runout 

single  cubic  curv  e 

Mn  = 2Af,l_I  - M„_2 

1 

4 

1 

...  o 

0 

0 

m3 

y2-2y,+y4 

Spline 

on  the  first  two  and 

6 

last  two  intervals. 

0 

0 

0 

...  , 

4 

1 

Mn-2 

~T2 

_.v„-2  - 2y„.|  +y„ 

0 

0 

0 

...  o 

0 

6 

_ 

The  Natural  Spline 

The  two  simplest  mathematical  conditions  we  can  impose  are 

M 1 = Mn  = 0 

These  conditions  together  with  15  result  in  an  n x n linear  system  for  M\,  M2, M„,  which  can  be  written  in  matrix  form  as 


’l 

0 

0 

0 .. 

..  0 

0 

o' 

’ M 1 

0 

1 

4 

1 

0 .. 

..  0 

0 

0 

m2 

y 1 - 

-272+73 

0 

1 

4 

1 .. 

..  0 

0 

0 

m3 

6 

72“ 

-273+74 

: 

: 

: 

• 

• 

: 

s 

: 

h2 

: 

0 

0 

0 

0 .. 

..  1 

4 

1 

M M_i 

2 “ 

- 27m— 1 +7m 

0 

0 

0 

0 .. 

..  0 

0 

1 

Mn 

0 

For  numerical  calculations  it  is  more  convenient  to  eliminate  M 1 and  Mn  from  this  system  and  write 


(16) 
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y\ -272+73 
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• 

: 
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0 

0 
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4 

1 

M n—2 

7m-3-27m-2+7m-1 

0 

0 

0 

0 .. 

..  0 

1 

4 

Mn- 1 

7m-2“27«-1  +7h 

together  with 


M\  = 0 

(17) 

0 

II 

£ 

(18) 

Thus,  the  {n  — 2)  x («  — 2)  linear  system  can  be  solved  for  the  n _ 2 coefficients  M2,  M3, ...,  M„_i,  and  Mj  and  Mn  are 
determined  by  17  and  18. 

Physically,  the  natural  spline  results  when  the  ends  of  a drafting  spline  extend  freely  beyond  the  interpolating  points  without 
constraint.  The  end  portions  of  the  spline  outside  the  interpolating  points  will  fall  on  straight  line  paths,  causing  S,!  (x)  to 
vanish  at  the  endpoints  x 1 and  xn  and  resulting  in  the  mathematical  conditions  M \ = Mn  = 0. 

The  natural  spline  tends  to  flatten  the  interpolating  curve  at  the  endpoints,  which  may  be  undesirable.  Of  course,  if  it  is  required 
that  S'  (x)  vanish  at  the  endpoints,  then  the  natural  spline  must  be  used. 


The  Parabolic  Runout  Spline 

The  two  additional  constraints  imposed  for  this  type  of  spline  are 

Mi  = M2  (19) 


Mn  — M„_  1 


(20) 


If  we  use  the  preceding  two  equations  to  eliminate  M\  and  Mn  from  15,  we  obtain  the  [n  — 2)  x {n  — 2)  linear  system 
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0 

0 .. 

..  0 

0 

o' 

’ m2  ' 

71-272+73 

1 

4 

1 

0 .. 

..  0 

0 

0 

m3 

72-273+74 

0 

1 

4 

1 .. 

..  0 

0 

0 

m4 

6 

73-274+75 

s 

s 

j 

: 

i 

i 

i 

• 

h2 

: 

0 

0 

0 

0 .. 

..  1 

4 

1 

Mn- 2 

7«— 3 — 27m— 2 +7«— 1 

0 

0 

0 

0 .. 

..  0 

1 

5 

Mn- 1 

7m— 2 — 27m— 1 +7m 

for  M2,  M3, ...,  M„_i . Once  these  « — 2 values  have  been  determined,  M\  and  Mn  are  determined  from  19  and  20. 


From  14  we  see  that  M\  = M2  implies  that  ct  \ = 0,  and  Mn  = Mn-\  implies  that  an-\  = 0.  Thus,  from  3 there  are  no  cubic 
terms  in  the  formula  for  the  spline  over  the  end  intervals  [x\,  x2]  and  [xn-\,  xn] . Hence,  as  the  name  suggests,  the  parabolic 
runout  spline  reduces  to  a parabolic  curve  over  these  end  intervals. 


The  Cubic  Runout  Spline 


For  this  type  of  spline,  we  impose  the  two  additional  conditions 


M\  = 2M2  — M 3 


(22) 


Mn  — 2Mn-\  — Mm_  2 


(23) 


Using  these  two  equations  to  eliminate  M\  and  Mn  from  15  results  in  the  following  {n  — 2)  x {n  — 2)  linear  system  for 
M2,  M3, M„_  1: 
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0 
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1 

M„-2 

7m-3-27«-2+7m-1 

0 

0 

0 

0 .. 

..  0 

0 

6 

Mn- 1 

7m-2-27m-1  +7h 

(24) 


After  we  solve  this  linear  system  for  M2,  M3, M„_i,  we  can  use  22  and  23  to  determine  M\  and  Mn. 

If  we  rewrite  22  as 

M2  — M 1 = M3  — M2 

it  follows  from  14  that  1 = <22.  Because  £"'(*)  = 6^1  on  [x\,  *2]  and  &'"(*)  = 6^2  on  [*2>  *3],  we  see  that  Sr,r(x)  is 
constant  over  the  entire  interval  [x\,  *3] . Consequently,  &(*)  consists  of  a cubic  curve  over  the  interval  [x\,  *3]  rather 
than  two  different  cubic  curves  pieced  together  at  *2-  [To  see  this,  integrate  S*"(x)  three  times.]  A similar  analysis  shows  that 
S(x)  consists  of  a single  cubic  curve  over  the  last  two  intervals. 

Whereas  the  natural  spline  tends  to  produce  an  interpolating  curve  that  is  flat  at  the  endpoints,  the  cubic  runout  spline  has  the 
opposite  tendency:  it  produces  a curve  with  pronounced  curvature  at  the  endpoints.  If  neither  behavior  is  desired,  the  parabolic 
runout  spline  is  a reasonable  compromise. 

EXAMPLE  1 Using  a Parabolic  Runout  Spline 

The  density  of  water  is  well  known  to  reach  a maximum  at  a temperature  slightly  above  freezing.  Table  2,  from 
the  Handbook  of  Chemistry  and  Physics  (CRC  Press,  2009),  gives  the  density  of  water  in  grams  per  cubic 
centimeter  for  five  equally  spaced  temperatures  from  — 10°Ct°30°C-  We  will  interpolate  these  five 
temperature-density  measurements  with  a parabolic  runout  spline  and  attempt  to  find  the  maximum  density  of 
water  in  this  range  by  finding  the  maximum  value  on  this  cubic  spline.  In  the  exercises  we  ask  you  to  perform 
similar  calculations  using  a natural  spline  and  a cubic  runout  spline  to  interpolate  the  data  points. 


Table  2 


Temperature  (°C) 

Density  (g/cnv*) 

-10 

.99815 

0 

.99987 

10 

.99973 

20 

.99823 

30 

.99567 

Set 


Then 


*1  = 

-10, 

y i 

=99815 

*2  = 

0, 

72 

= .99987 

*3  = 

10, 

73 

=99973 

*4  = 

20, 

74 

=99823 

*5  = 

30, 

75 

=99567 

6bl -2^2+73]  th2  =-.0001116 
6 [72  — 2^3  +74] /A2  =—.0000816 
6[y3-  2^4+75]  /A2  = - .0000636 


and  the  linear  system  2 1 for  the  parabolic  runout  spline  becomes 

'5  1 0 1 T ^2 1 [-  0001116' 

14  1 M3  = -.0000816 
0 1 5j  m4  [-  0000636 

Solving  this  system  yields 

M2=  - .00001973 
M3=  - .00001293 
M4=  - .00001013 


From  19  and  20,  we  have 


Mx=M2  = - .00001973 
M5  = M4=  - .00001013 


Solving  for  the  ay's,  A,'s,  c{ s,  and  elf's  in  14,  we  obtain  the  following  expression  for  the  interpolating  parabolic 
runout  spline: 

-.00000987 (x  + 10)2  4-  0002707(x  4-  10)  + .99815,  -10  <x  < 0 

000000113(x  — 0)3  — 00000987(x  — 0)2  + .0000733(x  — 0)  +.99987,  0<x<10 

.000000047 (x  - 10)3  -.00000647 (x  - 10)2  - ,0000900(x  - 10)  4-  .99973,  10  < x < 20 

-.00000507 (x  - 20)2  - 0002053(x  - 20)  + .99823,  20  <x  < 30 

This  spline  is  plotted  in  Figure  10.4.6.  From  that  figure  we  see  that  the  maximum  is  attained  in  the  interval 
[0,10]  . To  find  this  maximum,  we  set  S*(x)  equal  to  zero  in  the  interval  [0,  10] : 

S'(x)  = ,000000339x2  - ,0000197x  4-  .0000733  = 0 


To  three  significant  digits  the  root  of  this  quadratic  in  the  interval  [0,  10]  is  x = 3.99,  and  for  this  value  of  x, 
^(3.99)  = 1.00001.  Thus,  according  to  our  interpolated  estimate,  the  maximum  density  of  water  is 
1.00001  gi  cm  attained  at  3.99°C-  This  agrees  well  with  the  experimental  maximum  density  of 

l.OOOOOg  / cm  attained  at  3.98°C-  (In  the  original  metric  system,  the  gram  was  defined  as  the  mass  of  one  cubic 
centimeter  of  water  at  its  maximum  density.) 
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Figure  10.4.6 


Closing  Remarks 

In  addition  to  producing  excellent  interpolating  curves,  cubic  splines  and  their  generalizations  are  useful  for  numerical 
integration  and  differentiation,  for  the  numerical  solution  of  differential  and  integral  equations,  and  in  optimization  theory. 

Exercise  Set  10.4 

1.  Derive  the  expressions  for  a2  and  in  Equations  14  of  Theorem  10.4.1. 

2.  The  six  points 

(0,  .00000),  (.2,  .19867),  (.4,  .38942), 

(.6,  .56464),  (.8,  .71736),  (1.0,  .84147) 

lie  on  the  graph  of  y = sin  x,  where  x is  in  radians. 

(a)  Find  the  portion  of  the  parabolic  runout  spline  that  interpolates  these  six  points  for  .4  < x < .6.  Maintain  an  accuracy  of 
five  decimal  places  in  your  calculations. 

(b)  Calculate  S(.  5)  for  the  spline  you  found  in  part  (a).  What  is  the  percentage  error  of  £(.5)  with  respect  to  the  “exact” 
value  of  sin(.5)  = .47943? 

Answer: 

(a)  S(x)  = - . 12643(;t  - 4)3  - .2021 1 (*  - A)2  + 92158(;c  - .4)  + .38942 

(b)  £(.5)  = .47943;  error  = 0% 

3.  The  following  five  points 

(0,1),  (1,7),  (2,  27),  (3,79),  (4,  181) 

lie  on  a single  cubic  curve. 

(a)  Which  of  the  three  types  of  cubic  splines  (natural,  parabolic  runout,  or  cubic  runout)  would  agree  exactly  with  the  single 
cubic  curve  on  which  the  five  points  lie? 

(b)  Determine  the  cubic  spline  you  chose  in  part  (a),  and  verify  that  it  is  a single  cubic  curve  that  interpolates  the  five  points. 
Answer: 


(a)  The  cubic  runout  spline 


(b)  S(x)  = 3x3  - 2x2  + 5x  + 1 

4.  Repeat  the  calculations  in  Example  1 using  a natural  spline  to  interpolate  the  five  data  points. 


Answer: 


- .000000420  + 10)J 

.0002140  4*  10) 

.99815, 

-10<x<0 

. 00000024 (x) 3 

.0000 1260)  2 

+ 

.0000880) 

+ 

.99987, 

0<x<  10 

j-  .000000040-  10)3  - 

.0000054  (t:  — 10)2 

- 

. 000092  (jt  — 10) 

+ 

.99973, 

10  <x  < 20 

.000000220  — 20)3  - 

.00000660  — 20)2 

— 

.0002120-20) 

+ 

.99823, 

20  <x  < 30 

Maximum  at  ( x , S(x))  = (3.93,  1.00004) 

5.  Repeat  the  calculations  in  Example  1 using  a cubic  runout  spline  to  interpolate  the  five  data  points. 
Answer: 


00000009O + 10)3  - 

.00001210 + 10)2 

+ 

.0002820  + 10) 

+ 

.99815, 

-10<x<0 

00000009O)3 

0000093O)2 

4= 

.000070  0) 

.99987, 

0<x<  10 

00000004O -10)3  - 

.00000660 -10)2 

- 

O 

O 

O 

O 

00 
-0 

X 

1 

0 
> — ✓ 

+ 

.99973, 

10  <x  < 20 

00000004O -20)3  - 

.00000530 -20)2 

— 

.000207  0 - 20) 

+ 

.99823, 

20  <x  < 30 

Maximum  at  ( x , S(x))  = (4.00,  1.00001) 

6.  Consider  the  five  points  (0,  0),  (.5,  1),  (1,  0),  (1.5,  — 1),  and  (2,  0)  on  the  graph  of  y = sin(ra)- 

(a)  Use  a natural  spline  to  interpolate  the  data  points  (0,  0),  (.5,  1),  and  (1,  0). 

(b)  Use  a natural  spline  to  interpolate  the  data  points  (.5,  1),  (1,  0),  and  (1.5,  — 1). 

(c)  Explain  the  unusual  nature  of  your  result  in  part  (b). 


Answer: 


(a) 


S(x)  = 


-4x2 + 3x 
4x2-\2x2  + 9x-\ 


0<x<0.5 

0.5<x<  1 


(b) 


S(x)  = 


(2  — 2x 
\2-2x 


0.5  <x  < 1 
1 <x  < 1.5 


(c)  The  three  data  points  are  collinear. 


7.  (The  Periodic  Spline)  If  it  is  known  or  if  it  is  desired  that  the  n points  (xj,  y\),  (*2,  y 2),  (*«,  7«)  to  be  interpolated  lie 

on  a single  cycle  of  a periodic  curve  with  period  xn  — x\9  then  an  interpolating  cubic  spline  S(x)  must  satisfy 

Six  i)=S(x„) 

S'(Xl)=S\x„) 

S"(xl)=S"(x„) 


(a)  Show  that  these  three  periodicity  conditions  require  that 

y 1 = 

MX  = Mn 

4M\  + M2+  M„_i  = 6(>„_i  -2^1  +^2) ! k2 


(b)  Using  the  three  equations  in  part  (a)  and  Equations  15,  construct  an  {n  — 1)  x {n  — 1)  linear  system  for 
M\,  M 2, M„_ i in  matrix  form. 

Answer: 
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: 
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1 

4 

1 

2 

yn-  3 

- 2y„-2 

+ 

yn- 1 

1 

0 

0 

0 • ■ 

■ • 0 

0 

1 

4 

M n—\ 

y n — 2 

- 

+ 

71 

8.  (The  Clamped  Spline)  Suppose  that,  in  addition  to  the  n points  to  be  interpolated,  we  are  given  specific  values  yj  and  yrn  for 
the  slopes  Sr  {x\)  and  S\xn)  of  the  interpolating  cubic  spline  at  the  endpoints  x \ and  xn. 

(a)  Show  that 

2M1  + M2  = 6O2-71  -hy[)  ih2 

2M„  + M„_i  = 6(>„_i  —yn-¥hy'n)  I h2 

(b)  Using  the  equations  in  part  (a)  and  Equations  15,  construct  an  ^ x n linear  system  for  M\,  M2, Mn  in  matrix  form. 


The  clamped  spline  described  in  this  exercise  is  the  most  accurate  type  of  spline  for  interpolation  work  if  the 
slopes  at  the  endpoints  are  known  or  can  be  estimated. 

Answer: 
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Technology  Exercises 


The  following  exercises  are  designed  to  be  solved  using  a technology  utility.  Typically,  this  will  be  MATLAB,  Mathematica , 
Maple,  Derive,  or  Mathcad,  but  it  may  also  be  some  other  type  of  linear  algebra  software  or  a scientific  calculator  with  some 
linear  algebra  capabilities.  For  each  exercise  you  will  need  to  read  the  relevant  documentation  for  the  particular  utility  you  are 
using.  The  goal  of  these  exercises  is  to  provide  you  with  a basic  proficiency  with  your  technology  utility.  Once  you  have 
mastered  the  techniques  in  these  exercises,  you  will  be  able  to  use  your  technology  utility  to  solve  many  of  the  problems  in  the 
regular  exercise  sets. 


Tl.  In  the  solution  of  the  natural  cubic  spline  problem,  it  is  necessary  to  solve  a system  of  equations  having  coefficient  matrix 


’4 
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0 

0" 
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..  0 
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..  1 
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0 . 

..  0 

1 

4 

If  we  can  present  a formula  for  the  inverse  of  this  matrix,  then  the  solution  for  the  natural  cubic  spline  problem  can  be  easily 
obtained.  In  this  exercise  and  the  next,  we  use  a computer  to  discover  this  formula.  Toward  this  end,  we  first  determine  an 


expression  for  the  determinant  of  denoted  by  the  symbol  Dn.  Given  that 

Ai  = [4]  and  A2=  * ] 

D\  =det(^i)  = det[4]  =4 

£>2  = det(j42)  = det 


we  see  that 
and 


4 1 
1 4 


= 15 


(a)  Use  the  cofactor  expansion  of  determinants  to  show  that 

Ai  = 4D} i - £>„_ 2 

for  n = 3,  4,  5 This  says,  for  example,  that 

D3  = 4D2  -Dx  =4(15)  — 4 = 56 
Da  = 4 D3  - Z)2  = 4(56)  - 15  = 209 

and  so  on.  Using  a computer,  check  this  result  for  5 < n < 10. 

(b)  By  writing 

Ai  = 4A,-1  “ Ai- 2 

and  the  identity,  Dn-\  = Dn-\,  in  matrix  form, 


show  that 


(c)  Use  the  methods  in  Section  5.2  and  a computer  to  show  that 

n— 1 _ /—  n— 1 _ /—  n— 2 


' A.  ' 

'4 

-r 

— 1 

1 

0_ 

1 

to 

a 

1 

to 

' A.  ' 

'4 

-r 

n—2 

'd2 

'4 

-l' 

n—2 

'15' 

— 1 

1 

0_ 

_A_ 

1 

0 

_ 4_ 

4 -1 
1 0 


in-2 


(2 4- 1/3) n -(2-/3)n  (2-/3)n  -(2  + /3) 


n— 2 


(2  + /3)“  2 — (2  — ^3)n  '-(2  + ^3) 


. n— 2 


n— 3 


n— 3 


and  hence 


: 


2/3 

(2  + ^+1-(  2-^r1 


2/3 


for  « = 1,  2,  3, .... 

(d)  Using  a computer,  check  this  result  for  1 < n < 10. 


T2.  In  this  exercise,  we  determine  a formula  for  calculating  from  D ^ for  k = 0,  1,  2,  3, «,  assuming  that  Dq  is  defined 

to  be  1 . 

(a)  Use  a computer  to  compute  A"1  for  £ = 1,2,  3,  4,  and  5. 


(b)  From  your  results  in  part  (a),  discover  the  conjecture  that 
where  mij  = aji  and 


4T1  = Kl 


Dn 

for  i <j. 

(c)  Use  the  result  in  part  (b)  to  compute  Aj~^  and  compare  it  to  the  result  obtained  using  the  computer. 
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10.5  Markov  Chains 

In  this  section  we  describe  a general  model  of  a system  that  changes  from  state  to  state.  We  then  apply  the  model 
to  several  concrete  problems. 


Prerequisites 

Linear  Systems 
Matrices 

Intuitive  Understanding  of  Limits 


A Markov  Process 

Suppose  a physical  or  mathematical  system  undergoes  a process  of  change  such  that  at  any  moment  it  can  occupy 
one  of  a finite  number  of  states.  For  example,  the  weather  in  a certain  city  could  be  in  one  of  three  possible 
states:  sunny,  cloudy,  or  rainy.  Or  an  individual  could  be  in  one  of  four  possible  emotional  states:  happy,  sad, 
angry,  or  apprehensive.  Suppose  that  such  a system  changes  with  time  from  one  state  to  another  and  at  scheduled 
times  the  state  of  the  system  is  observed.  If  the  state  of  the  system  at  any  observation  cannot  be  predicted  with 
certainty,  but  the  probability  that  a given  state  occurs  can  be  predicted  by  just  knowing  the  state  of  the  system  at 
the  preceding  observation,  then  the  process  of  change  is  called  a Markov  chain  or  Markov  process. 


DEFINITION  1 

If  a Markov  chain  has  k possible  states,  which  we  label  as  1 , 2 then  the  probability  that  the  system 
is  in  state  i at  any  observation  after  it  was  in  state  j at  the  preceding  observation  is  denoted  by  Pij  and  is 
called  the  transition  probability  from  state  j to  state  i.  The  matrix  P — [Pij]  is  called  the  transition 
matrix  of  the  Markov  chain. 


For  example,  in  a three-state  Markov  chain,  the  transition  matrix  has  the  form 

Preceding  State 


1 

2 

3 

p\\ 

P 12 

P 13 

P 21 

P 22 

P 23 

P 31 

P 32 

P 33 

1 

2 New  State 

3 


In  this  matrix,  P22  is  the  probability  that  the  system  will  change  from  state  2 to  state  3,  P\\  is  the  probability  that 
the  system  will  still  be  in  state  1 if  it  was  previously  in  state  1,  and  so  forth. 


EXAMPLE  1 Transition  Matrix  of  the  Markov  Chain 


A car  rental  agency  has  three  rental  locations,  denoted  by  1,  2,  and  3.  A customer  may  rent  a car 
from  any  of  the  three  locations  and  return  the  car  to  any  of  the  three  locations.  The  manager  finds 
that  customers  return  the  cars  to  the  various  locations  according  to  the  following  probabilities: 

Rented  from  Location 


1 2 3 

.8  .3  .2 
.1  .2  .6 
.1  .5  .2 


1 Returned 

2 to 

3 Location 


This  matrix  is  the  transition  matrix  of  the  system  considered  as  a Markov  chain.  From  this  matrix, 
the  probability  is  6 that  a car  rented  from  location  3 will  be  returned  to  location  2,  the  probability 
is  . 8 that  a car  rented  from  location  1 will  be  returned  to  location  1 , and  so  forth. 


EXAMPLE  2 Transition  Matrix  of  the  Markov  Chain 

By  reviewing  its  donation  records,  the  alumni  office  of  a college  finds  that  80%  of  its  alumni  who 
contribute  to  the  annual  fund  one  year  will  also  contribute  the  next  year,  and  30%  of  those  who  do 
not  contribute  one  year  will  contribute  the  next.  This  can  be  viewed  as  a Markov  chain  with  two 
states:  state  1 corresponds  to  an  alumnus  giving  a donation  in  any  one  year,  and  state  2 corresponds 
to  the  alumnus  not  giving  a donation  in  that  year.  The  transition  matrix  is 


In  the  examples  above,  the  transition  matrices  of  the  Markov  chains  have  the  property  that  the  entries  in  any 
column  sum  to  1.  This  is  not  accidental.  If  P = [Pij]  is  the  transition  matrix  of  any  Markov  chain  with  k states, 
then  for  each  j we  must  have 


Plj  + P2j  + --  + Pkj  = 1 


(1) 


because  if  the  system  is  in  state  j at  one  observation,  it  is  certain  to  be  in  one  of  the  k possible  states  at  the  next 
observation. 

A matrix  with  property  1 is  called  a stochastic  matrix , a probability  matrix , or  a Markov  matrix . From  the 
preceding  discussion,  it  follows  that  the  transition  matrix  for  a Markov  chain  must  be  a stochastic  matrix. 

In  a Markov  chain,  the  state  of  the  system  at  any  observation  time  cannot  generally  be  determined  with  certainty. 
The  best  one  can  usually  do  is  specify  probabilities  for  each  of  the  possible  states.  For  example,  in  a Markov 
chain  with  three  states,  we  might  describe  the  possible  state  of  the  system  at  some  observation  time  by  a column 
vector 


X = 


*1 
*2 
*3_ 

in  which  x i is  the  probability  that  the  system  is  in  state  1,  *2  the  probability  that  it  is  in  state  2,  and  *3  the 
probability  that  it  is  in  state  3.  In  general  we  make  the  following  definition. 


DEFINITION  2 

The  state  vector  for  an  observation  of  a Markov  chain  with  k states  is  a column  vector  x whose  zth 
component  x2  is  the  probability  that  the  system  is  in  the  ith  state  at  that  time. 
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Observe  that  the  entries  in  any  state  vector  for  a Markov  chain  are  nonnegative  and  have  a sum  of  1 . (Why?)  A 
column  vector  that  has  this  property  is  called  a probability  vector. 


Let  us  suppose  now  that  we  know  the  state  vector  x' IJ 1 for  a Markov  chain  at  some  initial  observation.  The 
following  theorem  will  enable  us  to  determine  the  state  vectors 

M 


-.Xs 


at  the  subsequent  observation  times. 


THEOREM  10.5.1 

If  P is  the  transition  matrix  of  a Markov  chain  and  x(w)  is  the  state  vector  at  the  wth  observation,  then 

x(»+i)=jPx(«). 


The  proof  of  this  theorem  involves  ideas  from  probability  theory  and  will  not  be  given  here.  From  this  theorem, 
it  follows  that 

xa)=Px(CD 
ym=Px(x)=p2ym 
xV)  = PxV)  = P\V) 

x(”)  = /*(”- lW”x<°> 

In  this  way,  the  initial  state  vector  x' 11 1 and  the  transition  matrix  P determine  for  n = 1,  2,  .... 

EXAMPLE  3 Example  2 Revisited 

The  transition  matrix  in  Example  2 was 


We  now  construct  the  probable  future  donation  record  of  a new  graduate  who  did  not  give  a donation  in  th< 


initial  year  after  graduation.  For  such  a graduate  the  system  is  initially  in  state  2 with  certainty,  so  the  initia 
vector  is 


x 


©- 


0 

1 


From  Theorem  10.5.1  we  then  have 

X® 

X® 

x® 


8 .3' 

'O' 

.3' 

2 7. 

1 

_.7_ 

8 .3' 

'.3' 

'.45' 

2 .7 

.7 

.55 

8 

.3' 

'.45' 

'.525' 

2 

.7 

.55 

.475  _ 

Thus,  after  three  years  the  alumnus  can  be  expected  to  make  a donation  with  probability  .525.  Beyond  thre 
years,  we  find  the  following  state  vectors  (to  three  decimal  places): 


x^  = 

'.563' 

,438_ 

'.581' 
4 1 9 _ 

x©  = 

'.591' 

409_ 

x^  = 

x^  = 

'.598' 

.402 

'.599' 

.401 

x™  = 

'.599' 

.401 

x^=' 

For  all  n beyond  1 1 , we  have 

.600" 

.400_ 

to  three  decimal  places.  In  other  words,  the  state  vectors  converge  to  a fixed  vector  as  the  number  of 
observations  increases.  (We  will  discuss  this  further  below.) 


EXAMPLE  4 Example  1 Revisited 


The  transition  matrix  in  Example  1 was 

.8  .3  .2 
.1  .2  .6 
.1  .5  .2 


If  a car  is  rented  initially  from  location  2,  then  the  initial  state  vector  is 


x 


<P>- 


0 

1 

0 


Using  this  vector  and  Theorem  10.5.1,  one  obtains  the  later  state  vectors  listed  in  Table  1 . 


Table  1 


\ n 

0 

I 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

0 

.300 

.400 

.477 

.511 

.533 

.544 

.550 

.553 

.555 

.556 

.557 

v(«) 

*2 

1 

.200 

.370 

.261 

.240 

.238 

.233 

.232 

.231 

.230 

.230 

v(»l) 

a3 

0 

.500 

.230 

.271 

.228 

->07 

.219 

.217 

.215 

.214 

.214 

.213 

For  all  values  of  tt  greater  than  1 1 , all  state  vectors  are  equal  to  x' 1 0 to  three  decimal  places. 

Two  things  should  be  observed  in  this  example.  First,  it  was  not  necessary  to  know  how  long  a customer  k< 
the  car.  That  is,  in  a Markov  process  the  time  period  between  observations  need  not  be  regular.  Second,  the 
state  vectors  approach  a fixed  vector  as  n increases,  just  as  in  the  first  example. 


EXAMPLES  Using  Theorem  10.5.1 


A traffic  officer  is  assigned  to  control  the  traffic  at  the  eight  intersections  indicated  in  Figure  10.5.1. 
She  is  instructed  to  remain  at  each  intersection  for  an  hour  and  then  to  either  remain  at  the  same 
intersection  or  move  to  a neighboring  intersection.  To  avoid  establishing  a pattern,  she  is  told  to 
choose  her  new  intersection  on  a random  basis,  with  each  possible  choice  equally  likely.  For  example, 
if  she  is  at  intersection  5,  her  next  intersection  can  be  2,  4,  5,  or  8,  each  with  probability  -jj-.  Every  day 

she  starts  at  the  location  where  she  stopped  the  day  before.  The  transition  matrix  for  this  Markov  chain 
is 

Old  Intersection 
2 3 4 5 6 7 


1 


I I 

3 3 

I I 

3 3 


8 


0 

D 

0 0 


4 0 0 


1 

4 


0 

0 


I 1 

3 5 


4 4 0 4 


0 0 
0 0 0 
0 0 


4 ± ± o 4 o 


0 

0 0 


III 

3 5 4 

1 1 

5 


1 0 4 4 0 0 


4 

0 0 


0 0 0 

D 

0 0 0 0 


4-  0 - - - 

"343 

1 1 

4 


4 o 4 1 


1 

2 

3 

4 

5 

6 

7 

8 


New 

Intersection 


J L 


if  ii  i r 


Figure  10.5.1 

If  the  traffic  officer  begins  at  intersection  5,  her  probable  locations,  hour  by  hour,  are  given  by  the 
state  vectors  given  in  Table  2.  For  all  values  of  n greater  than  22,  all  state  vectors  are  equal  to  1 to 
three  decimal  places.  Thus,  as  with  the  first  two  examples,  the  state  vectors  approach  a fixed  vector  as 
n increases. 


Table  2 


\ n 

0 

1 

2 

3 

4 

5 

10 

15 

20 

22 

0 

.000 

.133 

.116 

.130 

.123 

.113 

.109 

.108 

.107 

0 

.250 

.146 

.163 

.140 

.138 

.115 

.109 

.108 

.107 

0 

.000 

.050 

.039 

.067 

.073 

.100 

.106 

.107 

.107 

0 

.250 

.113 

.187 

.162 

.178 

.178 

.179 

.179 

.179 

X? 

1 

.250 

.279 

.190 

.190 

.168 

.149 

.144 

.143 

.143 

0 

.000 

.000 

.050 

.056 

.074 

.099 

.105 

.107 

.107 

A” 

0 

.000 

.133 

.104 

.131 

.125 

.138 

.142 

.143 

.143 

0 

.250 

.146 

.152 

.124 

.121 

.108 

.107 

.107 

.107 

Limiting  Behavior  of  the  State  Vectors 

In  our  examples  we  saw  that  the  state  vectors  approached  some  fixed  vector  as  the  number  of  observations 
increased.  We  now  ask  whether  the  state  vectors  always  approach  a fixed  vector  in  a Markov  chain.  A simple 
example  shows  that  this  is  not  the  case. 

EXAMPLE  6 System  Oscillates  Between  Two  State  Vectors 

Let 


P = 

'o  r 

and 

x®  = 

"f 

1 0 

0 

Then,  because  p^  — l and  p^  — p,  we  have  that 


and 


x©=x©  = x«>  = ...= 


x(1)  = x^  = x^  = ...= 


This  system  oscillates  indefinitely  between  the  two  state  vectors 
approach  any  fixed  vector. 


and 


, so  it  does  not 


However,  if  we  impose  a mild  condition  on  the  transition  matrix,  we  can  show  that  a fixed  limiting  state  vector  is 
approached.  This  condition  is  described  by  the  following  definition. 


DEFINITION  3 

A transition  matrix  is  regular  if  some  integer  power  of  it  has  all  positive  entries. 

L J 

Thus,  for  a regular  transition  matrix  P,  there  is  some  positive  integer  m such  that  all  entries  of  Pm  are  positive. 
This  is  the  case  with  the  transition  matrices  of  Examples  1 and  2 for  m — \ . In  Example  5 it  turns  out  that  p4  has 
all  positive  entries.  Consequently,  in  all  three  examples  the  transition  matrices  are  regular. 

A Markov  chain  that  is  governed  by  a regular  transition  matrix  is  called  a regular  Markov  chain.  We  will  see 
that  every  regular  Markov  chain  has  a fixed  state  vector  q such  that  PMx(U  1 approaches  q as  n increases  for  any 
choice  of  x(°).  This  result  is  of  major  importance  in  the  theory  of  Markov  chains.  It  is  based  on  the  following 
theorem. 


THEOREM  10.5.2  Behavior  of  P"  as 


P” 


n — » 

OQ 

OCb 

1 

q\ 

...  q\ 

<n 

?2 

— 

Qk 

<lk 

— <7  k 

+ #2  + • 

••  + 4 'kz 

We  will  not  prove  this  theorem  here.  We  refer  you  to  a more  specialized  text,  such  as  J.  Kemeny  and  J.  Snell, 
Finite  Markov  Chains  (New  York:  Springer- Verlag,  1976). 


Let  us  set 


‘<71 

<71  — 

<?f 

-<?r 

<72 

<72  — 

<7  2 

and 

q = 

<72 

<?fc 

<7Jc 

<7  k 

<7fc 

Thus,  Q is  a transition  matrix,  all  of  whose  columns  are  equal  to  the  probability  vector  q.  Q has  the  property  that 
if  x is  any  probability  vector,  then 


'<71 

<71 

— <?l‘ 

■*r 

’$1*1 

+ 

<7 1*2 

<?l*Jc" 

Qx  = 

<72 

<72 

— <72 

*2 

= 

<72*  1 

+ 

<72*2 

+---  + 

<?2*ft 

<7  k 

<7A 

— <7* 

<?fc*l 

+ 

<7**2 

+...  + 

<?fc*A: 

= (;q  +X2+---  + **) 


<71 

<72 


= (l)q=q 


<7  k 


That  is,  Q transforms  any  probability  vector  x into  the  fixed  probability  vector  q.  This  result  leads  to  the 
following  theorem. 


Behavior  of  P°x  as  n - oo 

If  P is  a regular  transition  matrix  and  x is  any  probability  vector,  then  as  ^ _►  qq, 


<1\ 


where  q is  a fixed  probability  vector,  independent  of  n , all  of  whose  entries  are  positive. 


This  result  holds  since  Theorem  10.5.2  implies  that  Pn  — ► Q as  ^ _►  oo-  This  in  turn  implies  that  Pnx  — ► Qx  = q 
as  n — ► oo-  Thus,  for  a regular  Markov  chain,  the  system  eventually  approaches  a fixed  state  vector  q.  The  vector 
q is  called  the  steady-state  vector  of  the  regular  Markov  chain. 

For  systems  with  many  states,  usually  the  most  efficient  technique  of  computing  the  steady- state  vector  q is 
simply  to  calculate  P”x  for  some  large  n.  Our  examples  illustrate  this  procedure.  Each  is  a regular  Markov 
process,  so  that  convergence  to  a steady-state  vector  is  ensured.  Another  way  of  computing  the  steady-state 
vector  is  to  make  use  of  the  following  theorem. 


Steady-State  Vector 

The  steady-state  vector  q of  a regular  transition  matrix  P is  the  unique  probability  vector  that  satisfies  the 
equation  Pq  = q. 


To  see  this,  consider  the  matrix  identity  ppn  = pn+^ . By  Theorem  10.5.2,  both  Pn  and  approach  Q as 
n _+  oo*  Thus,  we  have  PQ  = Q.  Any  one  column  of  this  matrix  equation  gives  Pq  = q.  To  show  that  q is  the 
only  probability  vector  that  satisfies  this  equation,  suppose  r is  another  probability  vector  such  that  Pr  = r*  Then 
also  Pn r = r for  n = 1,  2, ....  When  we  let  n _►  qq,  Theorem  10.5.3  leads  to  q = r. 


Theorem  10.5.4  can  also  be  expressed  by  the  statement  that  the  homogeneous  linear  system 

(/-P)q  = 0 

has  a unique  solution  vector  q with  nonnegative  entries  that  satisfy  the  condition  q \ + <3f2+---  + <3ffc  = 1 • We  can 
apply  this  technique  to  the  computation  of  the  steady- state  vectors  for  our  examples. 

EXAMPLE  7 Example  2 Revisited 


In  Example  2 the  transition  matrix  was 


so  the  linear  system  (/  — P)  q = 0 is 


.2  -.3' 

'q\ 

'O' 

1 

on 

CM 

i 

i 

<?2 

_0_ 

(2) 


This  leads  to  the  single  independent  equation 

2<?i  - 3^2  = 0 


or 

?1  = 1.5*2 


Thus,  when  we  set  *2  = s,  any  solution  of  2 is  of  the  form 


q = S 


1.5 

1 


where  s is  an  arbitrary  constant.  To  make  the  vector  q a probability  vector,  we  set 
s=  1 / (1.5  + 1)  = .4.  Consequently, 

r.6i 


is  the  steady-state  vector  of  this  regular  Markov  chain.  This  means  that  over  the  long  run,  60%  of 
the  alumni  will  give  a donation  in  any  one  year,  and  40%  will  not.  Observe  that  this  agrees  with  the 
result  obtained  numerically  in  Example  3. 


EXAMPLE  8 Example  1 Revisited 


In  Example  1 the  transition  matrix  was 


P = 


.8 

.1 

.1 


.3 

.2 

.5 


.2 

.6 

.2 


so  the  linear  system  (/  — P)  q = 0 is 


.2  -.3  -.2" 

vr 

"o' 

1 

CO 

1 

O'! 

<n 

= 

0 

1 

00 

in 

r 

r 

i 

q 3 

0 

The  reduced  row  echelon  form  of  the  coefficient  matrix  is  (verify) 


1 0 

0 1 
0 0 


34 

13 

14 
13 

0 


so  the  original  linear  system  is  equivalent  to  the  system 


When  we  set  £3  = s,  any  solution  of  the  linear  system  is  of  the  form 


q = s 


34 

13 

14 
13 
1 


To  make  this  a probability  vector,  we  set 


s = 


1 

3i  + il  i 
13  13 


13 

61 


Thus,  the  steady-state  vector  of  the  system  is 


q = 


34 

61 

li 

61 

II 

61 


.5573... 

.2295... 

.2131... 


This  agrees  with  the  result  obtained  numerically  in  Table  1 . The  entries  of  q give  the  long-run 
probabilities  that  any  one  car  will  be  returned  to  location  1,  2,  or  3,  respectively.  If  the  car  rental 
agency  has  a fleet  of  1000  cars,  it  should  design  its  facilities  so  that  there  are  at  least  558  spaces  at 
location  1,  at  least  230  spaces  at  location  2,  and  at  least  214  spaces  at  location  3. 


EXAMPLE  9 Example  5 Revisited 

We  will  not  give  the  details  of  the  calculations  but  simply  state  that  the  unique  probability  vector 
solution  of  the  linear  system  (/  — P)  q = 0 is 


_3_ 

28 

_3_ 

28 

_3_ 

28 

_5_ 

28 

_4_ 

28 

_3_ 

28 

_4_ 

28 

J_ 

28 


1071... 

1071... 

1071.. . 

1785.. . 

1428.. . 

1071.. . 

1428.. . 

1071.. . 


The  entries  in  this  vector  indicate  the  proportion  of  time  the  traffic  officer  spends  at  each 
intersection  over  the  long  term.  Thus,  if  the  objective  is  for  her  to  spend  the  same  proportion  of 
time  at  each  intersection,  then  the  strategy  of  random  movement  with  equal  probabilities  from  one 
intersection  to  another  is  not  a good  one.  (See  Exercise  5.) 


Exercise  Set  10.5 

1.  Consider  the  transition  matrix 


P = 


A .5 
.6  .5 


^ Calculate  x(”)  for  n = 1,  2,  3,  4,  5 if*®  = 

(b)  State  why  P is  regular  and  find  its  steady-state  vector. 
Answer: 


(a)  JX>  _ 

\4' 

, x©  = 

'.46' 

, x^  = 

'.454' 

, x^  = 

'.4546' 

. x^  = 

'.45454' 

.6 

.54 

.546 

.5454 

.54546 

(b) 


P is  regular  since  all  entries  of  P are  positive;  q = 


JL 

11 

_6_ 

11 


2.  Consider  the  transition  matrix 


.2  .1  .7 
.6  .4  .2 
.2  .5  .1 


P = 


(a)  Calculate  x'dX  x®  - and  x'3^ 10  three  decimal  places  if 

roi 


(b)  State  why  P is  regular  and  find  its  steady-state  vector. 


(a)  [_9_" 

17 

_8_ 

17 

(b)  [26" 

45 

19 

45 


(C)  \3_ 

19 

19 
12 
19 

4.  Let  P be  the  transition  matrix 


(a)  Show  that  P is  not  regular. 


(b) 


Show  that  as  n increases,  approaches 


for  any  initial  state  vector  xi:U  l. 


(c)  What  conclusion  of  Theorem  10.5.3  is  not  valid  for  the  steady  state  of  this  transition  matrix? 


Answer: 


(a) 


P”  = 


&r 

-&r 


, n = 1 , 2,  — Thus,  no  integer  power  of  P has  all  positive  entries. 


(b)  pn 

(c) 


0 0 
1 1 


as  n increases,  so 


The  entries  of  the  limiting  vector 


for  any  x©  as  n increases, 
are  not  all  positive. 


5.  Verify  that  if  P is  a x k regular  transition  matrix  all  of  whose  row  sums  are  equal  to  1 , then  the  entries  of  its 
steady-state  vector  are  all  equal  to  ] / £. 

6.  Show  that  the  transition  matrix 


P = 


0 I I 

2 2 


1 I 0 

2 2 


■k  0 ± 


is  regular,  and  use  Exercise  5 to  find  its  steady-state  vector. 


Answer: 


t 1 r 

' 1 ' 

2 4 4 

3 

1 1 1 
4 2 4 

has  all  positive  entries;  q = 

1 

3 

1 1 1 

1 

4 4 2 

3 

7.  John  is  either  happy  or  sad.  If  he  is  happy  one  day,  then  he  is  happy  the  next  day  four  times  out  of  five.  If  he 
is  sad  one  day,  then  he  is  sad  the  next  day  one  time  out  of  three.  Over  the  long  term,  what  are  the  chances  that 
John  is  happy  on  any  given  day? 

Answer: 

10 

13 

8.  A country  is  divided  into  three  demographic  regions.  It  is  found  that  each  year  5%  of  the  residents  of  region  1 
move  to  region  2,  and  5%  move  to  region  3.  Of  the  residents  of  region  2,  15%  move  to  region  1 and  10% 
move  to  region  3.  And  of  the  residents  of  region  3,  10%  move  to  region  1 and  5%  move  to  region  2.  What 
percentage  of  the  population  resides  in  each  of  the  three  regions  after  a long  period  of  time? 

Answer: 

12  1 
54-7%  in  region  1,  16-7%  in  region  2,  and  2 9 -7%  in  region  3 
6 3 6 

Technology  Exercises 

The  following  exercises  are  designed  to  be  solved  using  a technology  utility.  Typically,  this  will  be  MATLAB, 
Mathematical  Maple,  Derive,  or  Mathcad,  but  it  may  also  be  some  other  type  of  linear  algebra  software  or  a 
scientific  calculator  with  some  linear  algebra  capabilities.  For  each  exercise  you  will  need  to  read  the  relevant 
documentation  for  the  particular  utility  you  are  using.  The  goal  of  these  exercises  is  to  provide  you  with  a basic 
proficiency  with  your  technology  utility.  Once  you  have  mastered  the  techniques  in  these  exercises,  you  will  be 
able  to  use  your  technology  utility  to  solve  many  of  the  problems  in  the  regular  exercise  sets. 

Tl.  Consider  the  sequence  of  transition  matrices 

{P2,P  3,/V..) 


with 
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0 0 0 1 ]• 
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0 0 ± 4 j 
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0 -k  4-  ± -±- 


1 4 


2 3 4 

1 1 I 

3 4 


and  so  on. 

(a)  Use  a computer  to  show  that  each  of  these  four  matrices  is  regular  by  computing  their  squares. 

(b)  Verify  Theorem  10.5.2  by  computing  the  100th  power  of  P ^ for  k = 2,3,  4,  5.  Then  make  a conjecture  as  to 
the  limiting  value  of  P£  as  « _►  qo  for  all  it  = 2,  3,  4, . . . . 

(c)  Verify  that  the  common  column  qfc  of  the  limiting  matrix  you  found  in  part  (b)  satisfies  the  equation 
PkUk  = q/c-  as  required  by  Theorem  10.5.4. 


T2.  A mouse  is  placed  in  a box  with  nine  rooms  as  shown  in  the  accompanying  figure.  Assume  that  it  is  equally 
likely  that  the  mouse  goes  through  any  door  in  the  room  or  stays  in  the  room. 

(a)  Construct  the  9 x 9 transition  matrix  for  this  problem  and  show  that  it  is  regular. 

(b)  Determine  the  steady-state  vector  for  the  matrix. 

(c)  Use  a symmetry  argument  to  show  that  this  problem  may  be  solved  using  only  a 3 x 3 matrix. 


Figure  Ex-T2 
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10.6  Graph  Theory 

In  this  section  we  introduce  matrix  representations  of  relations  among  members  of  a set.  We  use  matrix 
arithmetic  to  analyze  these  relationships. 


Prerequisites 

Matrix  Addition  and  Multiplication 


Relations  Among  Members  of  a Set 

There  are  countless  examples  of  sets  with  finitely  many  members  in  which  some  relation  exists  among 
members  of  the  set.  For  example,  the  set  could  consist  of  a collection  of  people,  animals,  countries, 
companies,  sports  teams,  or  cities;  and  the  relation  between  two  members,  A and  B , of  such  a set  could  be  that 
person  A dominates  person  B , animal  A feeds  on  animal  B , country  A militarily  supports  country  B , company 
A sells  its  product  to  company  B , sports  team  A consistently  beats  sports  team  B , or  city  A has  a direct  airline 
flight  to  city  B. 

We  will  now  show  how  the  theory  of  directed  graphs  can  be  used  to  mathematically  model  relations  such  as 
those  in  the  preceding  examples. 


Directed  Graphs 

A directed  graph  is  a finite  set  of  elements,  (Pi,  Pj, ...,  Pn)  , together  with  a finite  collection  of  ordered 
pairs  (Pi,  Pj)  of  distinct  elements  of  this  set,  with  no  ordered  pair  being  repeated.  The  elements  of  the  set  are 
called  vertices,  and  the  ordered  pairs  are  called  directed  edges,  of  the  directed  graph.  We  use  the  notation 
Pi  — ► Pj  (which  is  read  “P,-  is  connected  to  Pj”)  to  indicate  that  the  directed  edge  (Pi,  Pj)  belongs  to  the 
directed  graph.  Geometrically,  we  can  visualize  a directed  graph  (Figure  10.6.1)  by  representing  the  vertices 
as  points  in  the  plane  and  representing  the  directed  edge  Pi  — » Pj  by  drawing  a line  or  arc  from  vertex  P,  to 
vertex  Pj,  with  an  arrow  pointing  from  Pl  to  Pj . If  both  P}  — ► Pj  and  Pj  — ► P,  hold  (denoted  Pj  Pj) , we 
draw  a single  line  between  Pi  and  Pj  with  two  oppositely  pointing  arrows  (as  with  Pj  and  P3  in  the  figure). 

P 2 

Pi 


P 5 

Pa 


Figure  10.6.1 


As  in  Figure  10.6.1,  for  example,  a directed  graph  may  have  separate  “components”  of  vertices  that  are 
connected  only  among  themselves;  and  some  vertices,  such  as  P$,  may  not  be  connected  with  any  other 
vertex.  Also,  because  P2  — ► Pj  is  not  permitted  in  a directed  graph,  a vertex  cannot  be  connected  with  itself  by 
a single  arc  that  does  not  pass  through  any  other  vertex. 

Figure  10.6.2  shows  diagrams  representing  three  more  examples  of  directed  graphs.  With  a directed  graph 
having  n vertices,  we  may  associate  an  n x n matrix  M = ] , called  the  vertex  matrix  of  the  directed 

graph.  Its  elements  are  defined  by 

i,  ePi^Pj 

0,  otherwise 

for  i,  j = 1,2, ...,  n.  For  the  three  directed  graphs  in  Figure  10.6.2,  the  corresponding  vertex  matrices  are 

0 10  0 
0 0 10 
0 10  1 
0 0 0 0 

"0  1 0 0 f 
0 0 110 

Figure  b:  M = 0 0 0 1 0 

0 10  0 1 

0 110  0 

0 10  0 
10  10 
10  0 1 
10  0 0 


Figure  c: 
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Figure  10.6.2 


By  their  definition,  vertex  matrices  have  the  following  two  properties: 

(i)  All  entries  are  either  0 or  1 . 

All  diagonal  entries  are  0. 

Conversely,  any  matrix  with  these  two  properties  determines  a unique  directed  graph  having  the  given  matrix 
as  its  vertex  matrix.  For  example,  the  matrix 


M = 


0 

0 

1 

0 


1 1 
0 1 
0 0 
0 0 


0 

0 

1 

0 


determines  the  directed  graph  in  Figure  10.6.3. 


Figure  10.6.3 


EXAMPLE  1 Influences  Within  a Family 


A certain  family  consists  of  a mother,  father,  daughter,  and  two  sons.  The  family  members  have 
influence,  or  power,  over  each  other  in  the  following  ways:  the  mother  can  influence  the 
daughter  and  the  oldest  son;  the  father  can  influence  the  two  sons;  the  daughter  can  influence 
the  father;  the  oldest  son  can  influence  the  youngest  son;  and  the  youngest  son  can  influence  the 
mother.  We  may  model  this  family  influence  pattern  with  a directed  graph  whose  vertices  are 
the  five  family  members.  If  family  member  A influences  family  member  B,  we  write  A-*B- 
Figure  10.6.4  is  the  resulting  directed  graph,  where  we  have  used  obvious  letter  designations  for 
the  five  family  members.  The  vertex  matrix  of  this  directed  graph  is 


M 

F 

D 

OS 

YS 


MFDOSYS 

0 0 110 
0 0 0 1 1 

0 10  0 0 

0 0 0 0 1 

1 0 0 0 0 
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YS 


OS 
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Figure  10.6.4 


EXAMPLE  2 Vertex  Matrix:  Moves  on  a Chessboard 

In  chess  the  knight  moves  in  an  “L”-shaped  pattern  about  the  chessboard.  For  the  board  in 
Figure  10.6.5  it  may  move  horizontally  two  squares  and  then  vertically  one  square,  or  it  may 
move  vertically  two  squares  and  then  horizontally  one  square.  Thus,  from  the  center  square  in 
the  figure,  the  knight  may  move  to  any  of  the  eight  marked  shaded  squares.  Suppose  that  the 
knight  is  restricted  to  the  nine  numbered  squares  in  Figure  10.6.6.  If  by  i — ► j we  mean  that  the 
knight  may  move  from  square  i to  square y,  the  directed  graph  in  Figure  10.6.7  illustrates  all 


possible  moves  that  the  knight  may  make  among  these  nine  squares.  In  Figure  10.6.8  we  have 
“unraveled”  Figure  10.6.7  to  make  the  pattern  of  possible  moves  clearer. 


The  vertex  matrix  of  this  directed  graph 

is 

given  by 
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Figure  10.6.5 
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Figure  10.6.6 
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Figure  10.6.7 
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Figure  10.6.8 

In  Example  1 the  father  cannot  directly  influence  the  mother;  that  is , F —*  M is  not  true.  But  he  can  influence 
the  youngest  son,  who  can  then  influence  the  mother.  We  write  this  as  p _ ► YS  — ► M and  call  it  a 2-step 
connection  from  F to  M.  Analogously,  we  call  M D a 1-step  connection , p _►  _►  _►  a 3-step 

connection , and  so  forth.  Let  us  now  consider  a technique  for  finding  the  number  of  all  possible  r-step 
connections  (r  = 1,  2, ...)  from  one  vertex  P2  to  another  vertex  Pj  of  an  arbitrary  directed  graph.  (This  will 
include  the  case  when  P2  and  Pj  are  the  same  vertex.)  The  number  of  1-step  connections  from  P2  to  Pj  is 

simply  mij.  That  is,  there  is  either  zero  or  one  l-step  connection  from  P2  to  Pj,  depending  on  whether  mij  is 

(2) 

zero  or  one.  For  the  number  of  2-step  connections,  we  consider  the  square  of  the  vertex  matrix.  If  we  let  ^ / 
be  the  (i,  y)-th  element  of  we  have 

(2) 

mi j =mi\m\j  + m^^j  +---  + winmnj  (1) 

Now,  if  = m\j  = 1,  there  is  a 2-step  connection  P\  — ► P\  — ► Pj  from  P2  to  Pj.  But  if  either  or  m\j  is 
zero,  such  a 2-step  connection  is  not  possible.  Thus  P\  —>  P i — ► Pj  is  a 2-step  connection  if  and  only  if 
Wj\m\  j = 1 . Similarly,  for  any  k = 1,  2, n,  P\  — ► Pfr  — ► Pj  is  a 2-step  connection  from  P2  to  Pj  if  and 
only  if  the  term  mikmkj  on  the  right  side  of  1 is  one;  otherwise,  the  term  is  zero.  Thus,  the  right  side  ofl  is 
the  total  number  of  two  2-step  connections  from  P2  to  Pj. 

A similar  argument  will  work  for  finding  the  number  of  3 — , 4 — , r-step  connections  from  P2  to  Pj . In 
general,  we  have  the  following  result. 


THEOREM  10.6.1 

Let  M be  the  vertex  matrix  of  a directed  graph  and  let  w|yJ  be  the  (i,  y)-th  element  of  Mr-  Then  w|yJ 
is  equal  to  the  number  of  r-step  connections  from  P2  to  Pj. 


EXAMPLE  3 Using  Theorem  10.6.1 


Figure  10.6.9  is  the  route  map  of  a small  airline  that  services  the  four  cities  P^  P3,  P3,  f 4-  As 
a directed  graph,  its  vertex  matrix  is 


M = 


We  have  that 
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f 

'l 

3 

3 

l" 

1 

1 

1 

1 

and 
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If  we  are  interested  in  connections  from  city  P4  to  city  P3,  we  may  use  Theorem  10.6. 1 to  find 

*43 


their  number.  Because  m 43  = 1,  there  is  one  1-step  connection;  because  = there  is  one 


2-step  connection;  and  because  = 3,  there  are  three  3-step  connections.  To  verify  this, 
from  Figure  10.6.9  we  find 


1-step  connections  from  P 4 to  P3: 

P 4 

->P3 

2 -step  connections  from  P 4 to  P3: 

P 4 

-»P3 

3 -step  connections  from  P4  to  P3: 

P 4 

->P3 

-»P4 

-P3 

P 4 

-P3 

P 4 

->P3 

-Pi 

-P3 

P 4 

Figure  10.6.9 


Cliques 

In  everyday  language  a “clique”  is  a closely  knit  group  of  people  (usually  three  or  more)  that  tends  to 
communicate  within  itself  and  has  no  place  for  outsiders.  In  graph  theory  this  concept  is  given  a more  precise 
meaning. 


DEFINITION  1 


A subset  of  a directed  graph  is  called  a clique  if  it  satisfies  the  following  three  conditions: 

(i)  The  subset  contains  at  least  three  vertices. 

(ii)  For  each  pair  of  vertices  P2  and  Pj  in  the  subset,  both  Pj  — ► Pj  and  Pj  — ► Pj  are  true. 

(iii)  The  subset  is  as  large  as  possible;  that  is,  it  is  not  possible  to  add  another  vertex  to  the  subset  and 
still  satisfy  condition  (ii). 


L J 

This  definition  suggests  that  cliques  are  maximal  subsets  that  are  in  perfect  “communication”  with  each  other. 
For  example,  if  the  vertices  represent  cities,  and  Pj  — ► Pj  means  that  there  is  a direct  airline  flight  from  city 
Pj  to  city  Pj,  then  there  is  a direct  flight  between  any  two  cities  within  a clique  in  either  direction. 

EXAMPLE  4 A Directed  Graph  with  Two  Cliques 

The  directed  graph  illustrated  in  Figure  10.6.10  (which  might  represent  the  route  map  of  an 
airline)  has  two  cliques: 

{■Pi , Pi>  Pa)  and  {P3.  PA’  P 6 ) 

This  example  shows  that  a directed  graph  may  contain  several  cliques  and  that  a vertex  may 
simultaneously  belong  to  more  than  one  clique. 

Pi 


P 3 


Pt i 


P3 


P 4 
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Pi 


Figure  10.6.10 


For  simple  directed  graphs,  cliques  can  be  found  by  inspection.  But  for  large  directed  graphs,  it  would  be 
desirable  to  have  a systematic  procedure  for  detecting  cliques.  For  this  purpose,  it  will  be  helpful  to  define  a 
matrix  S = [sy  ] related  to  a given  directed  graph  as  follows: 


Sy  - 


if  Pi*+Pj 
otherwise 


The  matrix  S determines  a directed  graph  that  is  the  same  as  the  given  directed  graph,  with  the  exception  that 
the  directed  edges  with  only  one  arrow  are  deleted.  For  example,  if  the  original  directed  graph  is  given  by 
Figure  10.6. 11a,  the  directed  graph  that  has  S as  its  vertex  matrix  is  given  in  Figure  10.6.1 16.  The  matrix  S 
may  be  obtained  from  the  vertex  matrix  M of  the  original  directed  graph  by  setting  Sjj  = 1 if  = 1 

and  setting  Sjj  = 0 otherwise. 

Px  Pi 
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Figure  10.6.11 


The  following  theorem,  which  uses  the  matrix  S,  is  helpful  for  identifying  cliques. 


Identifying  Cliques 

Let  s'  "’1  be  the  (i,  j)-th  element  of  Then  a vertex  Pj  belongs  to  some  clique  if  and  only  if  ^ Q. 


If  £ o,  then  there  is  at  least  one  3-step  connection  from  Pl  to  itself  in  the  modified  directed  graph 
determined  by  S.  Suppose  it  is  Pj  — * Pj  — » Pfr  — ► Pj.  In  the  modified  directed  graph,  all  directed  relations  are 
two-way,  so  we  also  have  the  connections  Pj «-» Pj  <-»  Pfc  <-=>  Pj.  But  this  means  that  {Pj,  Pj,  Pk } is  either  a 
clique  or  a subset  of  a clique.  In  either  case,  Pj  must  belong  to  some  clique.  The  converse  statement,  “if  Pi 
belongs  to  a clique,  then  ^ q,”  follows  in  a similar  manner. 


EXAMPLES  Using  Theorem  10.6.2 


Suppose  that  a directed  graph  has  as  its  vertex  matrix 

0 1 1 


M = 


1 0 1 
0 1 0 
1 0 0 


Then 
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S= 


Because  all  diagonal  entries  of  S'-'  are  zero,  it  follows  from  Theorem  10.6.2  that  the  directed 
graph  has  no  cliques. 


EXAMPLE  6 Using  Theorem  10.6.2 


Suppose  that  a directed  graph  has  as  its  vertex  matrix 

0 10  11 
10  0 10 
M=  11010 
110  0 0 
l1  0 0 1 0 

Then 


S= 


The  nonzero  diagonal  entries  of  S^  are  s|y',  and  Consequently,  in  the  given  directed 

1 1 TT 

graph,  Pj,  P'j_,  and  P4  belong  to  cliques.  Because  a clique  must  contain  at  least  three  vertices, 
the  directed  graph  has  only  one  clique,  {Pj,  P2,  P4}  . 
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Dominance-Directed  Graphs 

In  many  groups  of  individuals  or  animals,  there  is  a definite  “pecking  order”  or  dominance  relation  between 
any  two  members  of  the  group.  That  is,  given  any  two  individuals  A and  B,  either  A dominates  B or  B 


dominates  A , but  not  both.  In  terms  of  a directed  graph  in  which  P2-  — ► Pj  means  P2  dominates  P j,  this  means 
that  for  all  distinct  pairs,  either  P2  — ► Pj  or  Pj  — ► Pi , but  not  both.  In  general,  we  have  the  following 
definition. 

r n 


DEFINITION  2 

A dominance-directed  graph  is  a directed  graph  such  that  for  any  distinct  pair  of  vertices  P2  and  Pj, 
either  Pj  — ► Pj  or  Pj  — ► Pj,  but  not  both. 

J 


An  example  of  a directed  graph  satisfying  this  definition  is  a league  of  n sports  teams  that  play  each  other 
exactly  one  time,  as  in  one  round  of  a round-robin  tournament  in  which  no  ties  are  allowed.  If  Pj  — ► Pj  means 
that  team  P2  beat  team  Pj  in  their  single  match,  it  is  easy  to  see  that  the  definition  of  a dominance-directed 
group  is  satisfied.  For  this  reason,  dominance-directed  graphs  are  sometimes  called  tournaments. 

Figure  10.6.12  illustrates  some  dominance-directed  graphs  with  three,  four,  and  five  vertices,  respectively.  In 
these  three  graphs,  the  circled  vertices  have  the  following  interesting  property:  from  each  one  there  is  either  a 
1-step  or  a 2-step  connection  to  any  other  vertex  in  its  graph.  In  a sports  tournament,  these  vertices  would 
correspond  to  the  most  “powerful”  teams  in  the  sense  that  these  teams  either  beat  any  given  team  or  beat 
some  other  team  that  beat  the  given  team.  We  can  now  state  and  prove  a theorem  that  guarantees  that  any 
dominance-directed  graph  has  at  least  one  vertex  with  this  property. 


Connections  in  Dominance-Directed  Graphs 

In  any  dominance-directed  graph,  there  is  at  least  one  vertex  from  which  there  is  a 1-step  or  2-step 
connection  to  any  other  vertex. 


Consider  a vertex  (there  may  be  several)  with  the  largest  total  number  of  1-step  and  2-step 
connections  to  other  vertices  in  the  graph.  By  renumbering  the  vertices,  we  may  assume  that  Pi  is  such  a 
vertex.  Suppose  there  is  some  vertex  P2  such  that  there  is  no  1-step  or  2-step  connection  from  P\  to  P2 . Then, 
in  particular,  Pj  — ► P2  is  not  true,  so  that  by  definition  of  a dominance-directed  graph,  it  must  be  that 
Pi^p  i . Next,  let  Pfc  be  any  vertex  such  that  Pi  — ♦ Pft  is  true.  Then  we  cannot  have  P^  — ► P2 , as  then 
Pi  “ * Pfc  “ ♦ Pj  would  be  a 2-step  connection  from  Pi  to  P2 . Thus,  it  must  be  that  P2  — ► P^.  That  is,  P2  has 

1- step  connections  to  all  the  vertices  to  which  Pj  has  1-step  connections.  The  vertex  P2  must  then  also  have 

2- step  connections  to  all  the  vertices  to  which  Pi  has  2-step  connections.  But  because,  in  addition,  we  have 
that  P2  — ¥ P i , this  means  that  P2  has  more  1-step  and  2-step  connections  to  other  vertices  than  does  Pi . 
However,  this  contradicts  the  way  in  which  Pi  was  chosen.  Hence,  there  can  be  no  vertex  P2  to  which  Pi  has 
no  1-step  or  2-step  connection. 


Pi 


(«) 


Pa 


(b) 


P j 


P* 


(f) 

Figure  10.6.12 

This  proof  shows  that  a vertex  with  the  largest  total  number  of  1 -step  and  2-step  connections  to  other  vertices 
has  the  property  stated  in  the  theorem.  There  is  a simple  way  of  finding  such  vertices  using  the  vertex  matrix 
M and  its  square  M 1 . The  sum  of  the  entries  in  the  ith  row  of  M is  the  total  number  of  1 -step  connections 
from  Pj  to  other  vertices,  and  the  sum  of  the  entries  of  the  ith  row  of  M2  is  the  total  number  of  2-step 
connections  from  P2  to  other  vertices.  Consequently,  the  sum  of  the  entries  of  the  ith  row  of  the  matrix 
A=M+M2  is  the  total  number  of  1-step  and  2-step  connections  from  P2  to  other  vertices.  In  other  words, 
a row  of  A = M I M2  with  the  largest  row  sum  identifies  a vertex  having  the  property  stated  in  Theorem 
10.6.3. 

EXAMPLE  7 Using  Theorem  10.6.3 

Suppose  that  five  baseball  teams  play  each  other  exactly  once,  and  the  results  are  as  indicated  in 
the  dominance-directed  graph  of  Figure  10.6.13.  The  vertex  matrix  of  the  graph  is 
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1 
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so 


A=M  + M2 


The  row  sums  of  A are 


'o 

0 

1 

1 

0" 

'o 

1 

0 

1 

0" 

'o 

1 

1 

2 

0" 

1 

0 

1 

0 

1 

1 

0 

2 

3 
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0 
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1 
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0 

1 

1 

2 

3 

0 

1  st  row  sum  = 4 

2 nd  row  sum  = 9 

3 rd  row  sum  = 2 

4 th  row  sum  = 4 

5 th  row  sum  = 7 

Because  the  second  row  has  the  largest  row  sum,  the  vertex  P2  must  have  a 1 -step  or  2-step 
connection  to  any  other  vertex.  This  is  easily  verified  from  Figure  10.6.13. 
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Figure  10.6.13 


We  have  informally  suggested  that  a vertex  with  the  largest  number  of  1-step  and  2-step  connections  to  other 
vertices  is  a “powerful”  vertex.  We  can  formalize  this  concept  with  the  following  definition. 

r n 


DEFINITION  3 


The  power  of  a vertex  of  a dominance-directed  graph  is  the  total  number  of  1 -step  and  2-step 
connections  from  it  to  other  vertices.  Alternatively,  the  power  of  a vertex  Pj  is  the  sum  of  the  entries 
of  the  /th  row  of  the  matrix  A = M | M2,  where  M is  the  vertex  matrix  of  the  directed  graph. 


J 


EXAMPLE  8 Example  7 Revisited 


Let  us  rank  the  five  baseball  teams  in  Example  7 according  to  their  powers.  From  the 
calculations  for  the  row  sums  in  that  example,  we  have 

Power  of  team  P\  = 4 
Power  of  team  P2  = 9 
Power  of  team  P3  = 2 
Power  of  team  P4  = 4 
Power  of  team  P$  = 7 

Hence,  the  ranking  of  the  teams  according  to  their  powers  would  be 

P2  (first),  P5  (second),  P\  and  P4  (tied  for  third),  P3  (last) 


Exercise  Set  10.6 

1.  Construct  the  vertex  matrix  for  each  of  the  directed  graphs  illustrated  in  Figure  Ex-1. 
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Figure  Ex-1 


^6 


Answer: 


(a) 


0 

0 

0 

1 

1 

0 

1 

1 

1 

1 

0 
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0 
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1 
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0 

1 
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1 
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2.  Draw  a diagram  of  the  directed  graph  corresponding  to  each  of  the  following  vertex  matrices. 


(a) 

0 

1 

1 

0 

1 

0 

0 

0 

0 

0 

0 

1 

1 

0 

1 

0 

(b) 
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0 
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0 

1 

0 
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Answer: 

(a)  1 

p. 

p2 

(b)  ^ p' 

Ps 


P6  Ps  Pa 

3.  Let  M be  the  following  vertex  matrix  of  a directed  graph: 

0 1 1 f 
10  0 0 
0 10  1 
0 110 

(a)  Draw  a diagram  of  the  directed  graph. 

(b)  Use  Theorem  10.6.1  to  find  the  number  of  1-,  2-, and  3-step  connections  from  the  vertex  P\  to  the 
vertex  P2.  Verify  your  answer  by  listing  the  various  connections  as  in  Example  3. 

(c)  Repeat  part  (b)  for  the  1-,  2-,  and  3-step  connections  from  P\  to  P4. 

Answer: 


P 4 


P2  Ps 


(b) 

1- 

- step: 

^1 

-P  2 

2- 

- step: 

P 1 

-+P4 

-^2 

^1 

-+P3 

->P2 

3- 

- step: 

P\ 

-P  2 

-+Pl 

~>P2 

^1 

-+P3 

->P4 

P2 

^1 

-+P4 

->P3 

~>P2 

(c) 

1 - 

- step: 

P\ 

—*  P 4 

2- 

- step: 

P\ 

-*3 

->P4 

3- 

- step: 

P\ 

Pi 

-*P\ 

* Pa 

P\ 

- Pa 

- P3 

* Pa 

(a)  Compute  the  matrix  product  M ^ M f°r  the  vertex  matrix  M in  Example  1 . 


(b)  Verify  that  the  Mi  diagonal  entry  of  M TM  is  the  number  of  family  members  who  influence  the  Mi 
family  member.  Why  is  this  true? 

(c)  Find  a similar  interpretation  for  the  values  of  the  nondiagonal  entries  of  M ^ M- 


Answer: 


(a)  1 0 0 0 0 

0 10  0 0 

0 0 110 
0 0 12  1 

0 0 0 1 2 

(c)  The  i jth  entry  is  the  number  of  family  members  who  influence  both  the  ith  and  /th  family  members. 
5.  By  inspection,  locate  all  cliques  in  each  of  the  directed  graphs  illustrated  in  Figure  Ex-5. 


Pi 


(«) 


P3 

(h) 


Pi  P*  Ps 


(c) 

Figure  Ex-5 


Answer: 

(a)  {-Pi,  Pi,  P3} 

(b)  {P3,  P4,  P5} 

(c)  (P2,  P4,  P(„  Pz)  and  {P4,  P5.  ?6) 

6.  For  each  of  the  following  vertex  matrices,  use  Theorem  10.6.2  to  find  all  cliques  in  the  corresponding 
directed  graphs. 


(a) 


(b) 
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Answer: 
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1 

0 

1 

0 
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1 

0 

1 

0 

1 

1 

0 

1 

0 

1 
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0 

1 

1 

1 

0 

1 

1 

0 

1 

0 
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0 

1 

1 

1 

0 
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(a)  None 

(b)  {^3.^4 ’^6) 


7.  For  the  dominance-directed  graph  illustrated  in  Figure  Ex-7  construct  the  vertex  matrix  and  find  the  power 
of  each  vertex. 


/>, 


Figure  Ex-7 


Answer: 

'o 

0 

1 

f 

Power  of/>i  = 5 

1 

0 

0 

0 

Power  of  P 2 = 3 

0 

1 

0 

1 

Power  of  P 3 = 4 

0 

1 

0 

0 

Power  of  P4  = 2 

8.  Five  baseball  teams  play  each  other  one  time  with  the  following  results: 

A beats  B,  C,  D 
B beats  C,  E 
C beats  D,  E 

D beats  B 
E beats  A,  D 

Rank  the  five  baseball  teams  in  accordance  with  the  powers  of  the  vertices  they  correspond  to  in  the 
dominance-directed  graph  representing  the  outcomes  of  the  games. 


Answer: 


First,  A;  second,  B and  E (tie);  fourth,  C;  fifth,  D 

Technology  Exercises 

The  following  exercises  are  designed  to  be  solved  using  a technology  utility.  Typically,  this  will  be  MATLAB, 
Mathematica,  Maple,  Derive,  or  Mathcad,  but  it  may  also  be  some  other  type  of  linear  algebra  software  or  a 
scientific  calculator  with  some  linear  algebra  capabilities.  For  each  exercise  you  will  need  to  read  the 
relevant  documentation  for  the  particular  utility  you  are  using.  The  goal  of  these  exercises  is  to  provide  you 
with  a basic  proficiency  with  your  technology  utility.  Once  you  have  mastered  the  techniques  in  these 
exercises,  you  will  be  able  to  use  your  technology  utility  to  solve  many  of  the  problems  in  the  regular 
exercise  sets. 


Tl.  A graph  having  n vertices  such  that  every  vertex  is  connected  to  every  other  vertex  has  a vertex  matrix 
given  by 


M»  = 


0 1111 
10  111 
110  11 
1110  1 
11110 

11111 


In  this  problem  we  develop  a formula  for  whose  (i,  j) -th  entry  equals  the  number  of  A-step  connections 
from  Pj  to  Pj. 

(a)  Use  a computer  to  compute  the  eight  matrices  for  n = 2,  3 and  for  A:  = 2,  3,  4,  5. 

(b)  Use  the  results  in  part  (a)  and  symmetry  arguments  to  show  that  can  be  written  as 
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where  Un  is  the  ^ x n matrix  all  of  whose  entries  are  ones  and  ln  is  the  n x n identity  matrix. 

(f)  Show  that  for  n > 2,  all  vertices  for  these  directed  graphs  belong  to  cliques. 

T2.  Consider  a round-robin  tournament  among  n players  (labeled  ^1,^2?  £3,  - - an)  where  ct  \ beats  &2,a2 
beats  £3,  a 3 beats  C14, &n-\  beats  an , and  ctn  beats  a\.  Compute  the  “power”  of  each  player,  showing  that 
they  all  have  the  same  power;  then  determine  that  common  power. 

[Hint:  Use  a computer  to  study  the  cases  n = 3,  4,  5,  6;  then  make  a conjecture  and  prove  your  conjecture  to 
be  true.] 
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10.7  Games  of  Strategy 

In  this  section  we  discuss  a general  game  in  which  two  competing  players  choose  separate  strategies  to  reach  opposing 
objectives.  The  optimal  strategy  of  each  player  is  found  in  certain  cases  with  the  use  of  matrix  techniques. 


Prerequisites 

Matrix  Multiplication 
Basic  Probability  Concepts 


Game  Theory 


To  introduce  the  basic  concepts  in  the  theory  of  games,  we  will  consider  the  following  carnival-type  game  that  two 
people  agree  to  play.  We  will  call  the  participants  in  the  game  player  R and  player  C.  Each  player  has  a stationary  wheel 
with  a movable  pointer  on  it  as  in  Figure  10.7.1.  For  reasons  that  will  become  clear,  we  will  call  player  R's  wheel  the 
row-wheel  and  player  C's  wheel  the  column-wheel.  The  row-wheel  is  divided  into  three  sectors  numbered  1,  2,  and  3, 
and  the  column- wheel  is  divided  into  four  sectors  numbered  1,  2,  3,  and  4.  The  fractions  of  the  area  occupied  by  the 
various  sectors  are  indicated  in  the  figure.  To  play  the  game,  each  player  spins  the  pointer  of  his  or  her  wheel  and  lets  it 
come  to  rest  at  random.  The  number  of  the  sector  in  which  each  pointer  comes  to  rest  is  called  the  move  of  that  player. 
Thus,  player  R has  three  possible  moves  and  player  C has  four  possible  moves.  Depending  on  the  move  each  player 
makes,  player  C then  makes  a payment  of  money  to  player  R according  to  Table  1 . 


of  player  C 
Figure  10.7.1 


Table  1 


Player  Cs  Move 

1 

2 

3 

4 

Player  R's 
Move 

1 

$3 

$5 

-S2 

-SI 

2 

-$2 

S4 

-$3 

-$4 

3 

S6 

-S5 

$0 

$3 

For  example,  if  the  row- wheel  pointer  comes  to  rest  in  sector  1 (player  R makes  move  1),  and  the  column- wheel  pointer 
comes  to  rest  in  sector  2 (player  C makes  move  2),  then  player  C must  pay  player  R the  sum  of  $5.  Some  of  the  entries 
in  this  table  are  negative,  indicating  that  player  C makes  a negative  payment  to  player  R.  By  this  we  mean  that  player  R 
makes  a positive  payment  to  player  C.  For  example,  if  the  row- wheel  shows  2 and  the  column-wheel  shows  4,  then 
player  R pays  player  C the  sum  of  $4,  because  the  corresponding  entry  in  the  table  is  -$4.  In  this  way  the  positive  entries 
of  the  table  are  the  gains  of  player  R and  the  losses  of  player  C,  and  the  negative  entries  are  the  gains  of  player  C and  the 
losses  of  player  iL 

In  this  game  the  players  have  no  control  over  their  moves;  each  move  is  determined  by  chance.  However,  if  each  player 
can  decide  whether  he  or  she  wants  to  play,  then  each  would  want  to  know  how  much  he  or  she  can  expect  to  win  or  lose 
over  the  long  term  if  he  or  she  chooses  to  play.  (Later  in  the  section  we  will  discuss  this  question  and  also  consider  a 
more  complicated  situation  in  which  the  players  can  exercise  some  control  over  their  moves  by  varying  the  sectors  of 
their  wheels.) 


Two-Person  Zero-Sum  Matrix  Games 

The  game  described  above  is  an  example  of  a two-person  zero-sum  matrix  game.  The  term  zero-sum  means  that  in  each 
play  of  the  game,  the  positive  gain  of  one  player  is  equal  to  the  negative  gain  (loss)  of  the  other  player.  That  is,  the  sum 
of  the  two  gains  is  zero.  The  term  matrix  game  is  used  to  describe  a two-person  game  in  which  each  player  has  only  a 
finite  number  of  moves,  so  that  all  possible  outcomes  of  each  play,  and  the  corresponding  gains  of  the  players,  can  be 
displayed  in  tabular  or  matrix  form,  as  in  Table  1 . 

In  a general  game  of  this  type,  let  player  R have  m possible  moves  and  let  player  C have  n possible  moves.  In  a play  of 
the  game,  each  player  makes  one  of  his  or  her  possible  moves,  and  then  a payoff  is  made  from  player  C to  player  R , 
depending  on  the  moves.  For  i = 1,  2, m,  and  j = 1,  2, let  us  set 

ajj  = payoff  that  player  C makes  to  player  R if  player  R 

makes  move  i and  player  C makes  move  j 

This  payoff  need  not  be  money;  it  can  be  any  type  of  commodity  to  which  we  can  attach  a numerical  value.  As  before,  if 
an  entry  aij  is  negative,  we  mean  that  player  C receives  a payoff  of  \^ij  | from  player  R.  We  arrange  these  mn  possible 
payoffs  in  the  form  of  an  ^ x n matrix 


an 

<*12  - 

■-  a\n 

<*21 

<*22  - 

- <*2n 

<*m  1 

<*m2  - 

amn 

which  we  will  call  the  payoff  matrix  of  the  game. 

Each  player  is  to  make  his  or  her  moves  on  a probabilistic  basis.  For  example,  for  the  game  discussed  in  the 


introduction,  the  ratio  of  the  area  of  a sector  to  the  area  of  the  wheel  would  be  the  probability  that  the  player  makes  the 
move  corresponding  to  that  sector.  Thus,  from  Figure  10.7.1,  we  see  that  player  R would  make  move  2 with  probability 
y , and  player  C would  make  move  2 with  probability  In  the  general  case  we  make  the  following  definitions: 


Pi  = probability  that  player  R makes  move  i 
qj  = probability  that  player  C makes  move  j 

It  follows  from  these  definitions  that 

?1  +?2  + * • • *Pm  = 1 

and 

q\  + + ’ * * = 1 

With  the  probabilities  Pi  and  2 j we  form  two  vectors: 


P = [Pi  P2 


Pm) 


and 


(i  = 1,  2, m) 
0 = 1,  2,...,  n) 


q = 


21 

22 


2n 


We  call  the  row  vector  p the  strategy  of  player  R and  the  column  vector  q the  strategy  of  player  C.  For  example,  from 
Figure  10.7.1  we  have 


I 

4 

1 


3 

1 

6 


for  the  carnival  game  described  earlier. 


From  the  theory  of  probability,  if  the  probability  that  player  R makes  move  i is  Pi,  and  independently  the  probability  that 
player  C makes  move  j is  2 j , then  Pi<2j  is  the  probability  that  for  any  one  play  of  the  game,  player  R makes  move  i and 
player  C makes  move  j.  The  payoff  to  player  R for  such  a pair  of  moves  is  aij.  If  we  multiply  each  possible  payoff  by  its 
corresponding  probability  and  sum  over  all  possible  payoffs,  we  obtain  the  expression 


<*\\P\2\  + anp\22  + — + ainPl2n  + <*2lP2<2l  + ...  + 


(1) 


Equation  1 is  a weighted  average  of  the  payoffs  to  player  R;  each  payoff  is  weighted  according  to  the  probability  of  its 
occurrence.  In  the  theory  of  probability,  this  weighted  average  is  called  the  expected  payoff  to  player  R.  It  can  be  shown 
that  if  the  game  is  played  many  times,  the  long-term  average  payoff  per  play  to  player  R is  given  by  this  expression.  We 
denote  this  expected  payoff  by  fi^p,  q)  to  emphasize  the  fact  that  it  depends  on  the  strategies  of  the  two  players.  From 
the  definition  of  the  payoff  matrix  A and  the  strategies  p and  q,  it  can  be  verified  that  we  may  express  the  expected 
payoff  in  matrix  notation  as 


£(p>  q)  = [/>i  pi  • 

- Pm) 

"an 

a2\ 

«12  - 
a22  • 

a\n 
•*  a2n 

~q\ 

<12 

1 

<*m2  - 

- amn 

= p^q 


(2) 


Because  fi^p,  q)  is  the  expected  payoff  to  player  R , it  follows  that  —E{ p,  q)  is  the  expected  payoff  to  player  C. 


EXAMPLE  1 Expected  Payoff  to  Player  R 


For  the  carnival  game  described  earlier,  we  have 


If  = 1805... 


Thus,  in  the  long  run,  player  R can  expect  to  receive  an  average  of  about  1 8 cents  from  player  C in  each 
play  of  the  game. 


So  far  we  have  been  discussing  the  situation  in  which  each  player  has  a predetermined  strategy.  We  will  now  consider 
the  more  difficult  situation  in  which  both  players  can  change  their  strategies  independently.  For  example,  in  the  game 
described  in  the  introduction,  we  would  allow  both  players  to  alter  the  areas  of  the  sectors  of  their  wheels  and  thereby 
control  the  probabilities  of  their  respective  moves.  This  qualitatively  changes  the  nature  of  the  problem  and  puts  us 
firmly  in  the  field  of  true  game  theory.  It  is  understood  that  neither  player  knows  what  strategy  the  other  will  choose.  It 
is  also  assumed  that  each  player  will  make  the  best  possible  choice  of  strategy  and  that  the  other  player  knows  this. 

Thus,  player  R attempts  to  choose  a strategy  p such  that  £(p,  q)  is  as  large  as  possible  for  the  best  strategy  q that  player 
C can  choose;  and  similarly,  player  C attempts  to  choose  a strategy  q such  that  £(p,  q)  is  as  small  as  possible  for  the 
best  strategy  p that  player  R can  choose.  To  see  that  such  choices  are  actually  possible,  we  will  need  the  following 
theorem,  called  the  Fundamental  Theorem  of  Two-Person  Zero-Sum  Games.  (The  general  proof,  which  involves  ideas 
from  the  theory  of  linear  programming,  will  be  omitted.  However,  below  we  will  prove  this  theorem  for  what  are  called 
strictly  determined  games  and  2x2  matrix  games.) 


Fundamental  Theorem  of  Zero-Sum  Games 

5f: 

There  exist  strategies  p and  q such  that 


£(p*>  q)  >£(p*,  q*)  >£(p,  q*) 


(3) 


for  all  strategies  p and  q. 


Jfc  SfC 

The  strategies  p and  q in  this  theorem  are  the  best  possible  strategies  for  players  R and  C,  respectively.  To  see  why 

this  is  so,  let  v = E( p , q )•  The  left-hand  inequality  of  Equation  3 then  reads 

* 

£(p  ,q)>v  for  all  strategies  q 
♦ 

This  means  that  if  player  R chooses  the  strategy  p , then  no  matter  what  strategy  q player  C chooses,  the  expected 

payoff  to  player  R will  never  be  below  v.  Moreover,  it  is  not  possible  for  player  R to  achieve  an  expected  payoff  greater 
than  v.  To  see  why,  suppose  there  is  some  strategy  p that  player  R can  choose  such  that 

** 

£(p  .q)>v 


for  all  strategies  q 


Then,  in  particular, 


**  * 

£(p  . q ) > v 

♦ ♦ * 

But  this  contradicts  the  right-hand  inequality  of  Equation  3,  which  requires  that  v > £(p  , q ) . Consequently,  the  best 

player  R can  do  is  prevent  his  or  her  expected  payoff  from  falling  below  the  value  v.  Similarly,  the  best  player  C can  do 

♦ 

is  ensure  that  player  R' s expected  payoff  does  not  exceed  v,  and  this  can  be  achieved  by  using  strategy  q . 

On  the  basis  of  this  discussion,  we  arrive  at  the  following  definitions. 


DEFINITION  1 

Ifp  and  q are  strategies  such  that 


£(p*>  q)  >£(p*,  q*)  >£(p,  q*) 


(4) 


for  all  strategies  p and  q,  then 

(i)  p + is  called  an  optimal  strategy  for  player  R. 

(ii)  q is  called  an  optimal  strategy  for  player  C. 
(ill)  y = E{ p ,q  ) is  called  the  value  of  the  game, 


L J 

The  wording  in  this  definition  suggests  that  optimal  strategies  are  not  necessarily  unique.  This  is  indeed  the  case,  and  in 
Exercise  2 we  ask  you  to  show  this.  However,  it  can  be  proved  that  any  two  sets  of  optimal  strategies  always  result  in 
the  same  value  v of  the  game.  That  is,  if  p , q and  p , q are  optimal  strategies,  then 


3|C  3fC  9|C  )|C 

£(p  , q ) = £(p  , 


q ) 


(5) 


The  value  of  a game  is  thus  the  expected  payoff  to  player  R when  both  players  choose  any  possible  optimal  strategies. 

* * 

To  find  optimal  strategies,  we  must  find  vectors  p and  q that  satisfy  Equation  4.  This  is  generally  done  by  using  linear 

programming  techniques.  Next,  we  discuss  special  cases  for  which  optimal  strategies  may  be  found  by  more  elementary 
techniques. 


We  now  introduce  the  following  definition. 


n 


DEFINITION  2 

An  entry  ctr5  in  a payoff  matrix  A is  called  a saddle  point  if 

(i)  ctrs  is  the  smallest  entry  in  its  row,  and 

(ii)  is  the  largest  entry  in  its  column. 

A game  whose  payoff  matrix  has  a saddle  point  is  called  strictly  determined. 

L J 

For  example,  the  shaded  element  in  each  of  the  following  payoff  matrices  is  a saddle  point: 


r:  it 

30 

-50 

-5' 

60 

90 

75 

L-4  oJ1 

-10 

60 

-30 

0 -3  5 -9 

15  -8  -2  10 

7 10  6 9 

6 11-3  2 


If  a matrix  has  a saddle  point  «rs,  it  turns  out  that  the  following  strategies  are  optimal  strategies  for  the  two  players: 

o" 


p = [0  0 ...  1 ...  0], 

/ 

rth  entiy 


q = 


sth  entry 


That  is,  an  optimal  strategy  for  player  R is  to  always  make  the  rth  move,  and  an  optimal  strategy  for  player  C is  to 
always  make  the  sth  move.  Such  strategies  for  which  only  one  move  is  possible  are  called  pure  strategies.  Strategies  for 
which  more  than  one  move  is  possible  are  called  mixed  strategies.  To  show  that  the  above  pure  strategies  are  optimal, 
you  can  verify  the  following  three  equations  (see  Exercise  6): 


* . * 

£(p  > q ) =p  =«« 


(6) 


£(p*  . q)  = p *Aq  > ar5  for  any  strategy  q 


(7) 


£(p  q*)  =vM*<<*rs  for  any  strategy  P (8) 

Together,  these  three  equations  imply  that 

£(p*.  q)  >£(p*,  q*)  >£(p.  q*) 

for  all  strategies  p and  q.  Because  this  is  exactly  Equation  4,  it  follows  that  p and  q are  optimal  strategies. 

From  Equation  6 the  value  of  a strictly  determined  game  is  simply  the  numerical  value  of  a saddle  point  ars-  It  is 
possible  for  a payoff  matrix  to  have  several  saddle  points,  but  then  the  uniqueness  of  the  value  of  a game  guarantees  that 
the  numerical  values  of  all  saddle  points  are  the  same. 

EXAMPLE  2 Optimal  Strategies  to  Maximize  a Viewing  Audience 

Two  competing  television  networks,  R and  C,  are  scheduling  one-hour  programs  in  the  same  time  period. 
Network  R can  schedule  one  of  three  possible  programs,  and  network  C can  schedule  one  of  four  possible 
programs.  Neither  network  knows  which  program  the  other  will  schedule.  Both  networks  ask  the  same 
outside  polling  agency  to  give  them  an  estimate  of  how  all  possible  pairings  of  the  programs  will  divide  the 
viewing  audience.  The  agency  gives  them  each  Table  2,  whose  (z,  j)- th  entry  is  the  percentage  of  the 
viewing  audience  that  will  watch  network  R if  network  R' s program  i is  paired  against  network  C s program 
j.  What  program  should  each  network  schedule  in  order  to  maximize  its  viewing  audience? 


Table  2 


Network  C*  s 
Program 

1 

2 

3 

4 

Network  R's 
Program 

1 

60 

20 

30 

55 

2 

50 

75 

45 

60 

3 

70 

45 

35 

30 

Subtract  50  from  each  entry  in  Table  2 to  construct  the  following  matrix: 


10 

-30 

-20 

5 

0 

25 

-5 

10 

20 

-5 

-15 

-20 

This  is  the  payoff  matrix  of  the  two-person  zero-sum  game  in  which  each  network  is  considered  to  start 
with  50%  of  the  audience,  and  the  (i,  j)- th  entry  of  the  matrix  is  the  percentage  of  the  viewing  audience 
that  network  C loses  to  network  R if  programs  i and  j are  paired  against  each  other.  It  is  easy  to  see  that  the 
entry 

<*2 3 = “ 5 


is  a saddle  point  of  the  payoff  matrix.  Hence,  the  optimal  strategy  of  network  R is  to  schedule  program  2, 
and  the  optimal  strategy  of  network  C is  to  schedule  program  3.  This  will  result  in  network  R' s receiving 
45%  of  the  audience  and  network  C s receiving  55%  of  the  audience. 


2x2  Matrix  Games 


Another  case  in  which  the  optimal  strategies  can  be  found  by  elementary  means  occurs  when  each  player  has  only  two 
possible  moves.  In  this  case,  the  payoff  matrix  is  a 2 x 2 matrix 


r«n  *i2 

[*21  *22 


If  the  game  is  strictly  determined,  at  least  one  of  the  four  entries  of  A is  a saddle  point,  and  the  techniques  discussed 
above  can  then  be  applied  to  determine  optimal  strategies  for  the  two  players.  If  the  game  is  not  strictly  determined,  we 
first  compute  the  expected  payoff  for  arbitrary  strategies  p and  q: 


£(P.  q)  =?Aq=  [PI  P2\ 

= a\\P\Q\  + <*127>1‘72+<*217>2<71  + <*227>2<?2 


'<*11  <*12~ 

'<7l' 

<*21  <*22  _ 

<72 

(9) 


Because 


P\+P2=^  and  <71 +<72  = 1 (10) 

we  may  substitute  p2  = 1 — pi  and  = 1 - <71  into  9 to  obtain 


£(p,  q)  = a\\p\q\  + <*12/>lO  -<7l)  + <»2lO  ~P\)<1\  +<*220  -<?l) 


(11) 


If  we  rearrange  the  terms  in  Equation  1 1 , we  can  write 


£(p,  q)  = [(<311  + 322  “312  - («22-«2l)]?l  + ("12  “ 322)^1  + «22  (12) 


By  examining  the  coefficient  of  the  <J\  term  in  12,  we  see  that  if  we  set 


* 

pi=pi 


aj2  ~ 321 

«11  +«22-ai2-«21 


(13) 


then  that  coefficient  is  zero,  and  12  reduces  to 


£(p*.  q)  = 


a 11^22 -312^21 

311+322-312-321 


(14) 


Equation  14  is  independent  of  q;  that  is,  if  player  R chooses  the  strategy  determined  by  13,  player  C cannot  change  the 
expected  payoff  by  varying  his  or  her  strategy. 


In  a similar  manner,  it  can  be  verified  that  if  player  C chooses  the  strategy  determined  by 

qx=a*  = *22-312 

1 311  +322-312-321 


then  substituting  in  12  gives 


£(p.q*)  = 


311322-312321 

311  +322-312-321 


Equations  14  and  16  show  that 


£(p  . q)  =£(p  . q ) =£(p,  q ) 


(15) 


(16) 


(17) 


for  all  strategies  p and  q.  Thus,  the  strategies  determined  by  13,  15,  and  10  are  optimal  strategies  for  players  R and  C, 
respectively,  and  so  we  have  the  following  result. 


Optimal  Strategies  for  a 2 x 2 Matrix  Game 


For  a 2 x 2 game  that  is  not  strictly  determined,  optimal  strategies  for  players  R and  C are 


♦ 

p 


322-321 

311  +322  — 312  — 321  311  + 


311-312 ] 

322-312-321  J 


and 


* 

q 


322  — 3n 

311  +322-312-321 

311  -321 

311  +322-312-321 


The  value  of  the  game  is 


v = 311^22  - 312321 

311  +322  — 312  —321 


♦ ♦ 

In  order  to  be  complete,  we  must  show  that  the  entries  in  the  vectors  p and  q are  numbers  strictly  between  0 and  1 . In 
Exercise  8 we  ask  you  to  show  that  this  is  the  case  as  long  as  the  game  is  not  strictly  determined. 


Equation  1 7 is  interesting  in  that  it  implies  that  either  player  can  force  the  expected  payoff  to  be  the  value  of  the  game 
by  choosing  his  or  her  optimal  strategy,  regardless  of  which  strategy  the  other  player  chooses.  This  is  not  true,  in 
general,  for  games  in  which  either  player  has  more  than  two  moves. 

EXAMPLE  3 Using  Theorem  10.7.2 


The  federal  government  desires  to  inoculate  its  citizens  against  a certain  flu  virus.  The  virus  has  two 
strains,  and  the  proportions  in  which  the  two  strains  occur  in  the  virus  population  is  not  known.  Two 
vaccines  have  been  developed  and  each  citizen  is  given  only  one  of  them.  Vaccine  1 is  85%  effective 
against  strain  1 and  70%  effective  against  strain  2.  Vaccine  2 is  60%  effective  against  strain  1 and  90% 
effective  against  strain  2.  What  inoculation  policy  should  the  government  adopt? 


We  can  consider  this  a two-person  game  in  which  player  R (the  government)  desires  to  make 
the  payoff  (the  fraction  of  citizens  resistant  to  the  virus)  as  large  as  possible,  and  player  C (the  virus) 
desires  to  make  the  payoff  as  small  as  possible.  The  payoff  matrix  is 

Strain 

1 2 

.85  .70" 

.60  .90 


Vaccine 


This  matrix  has  no  saddle  points,  so  Theorem  10.7.2  is  applicable.  Consequently, 


P 1 

* 
P 2 

* 

*1 

* 

*2 

v 


a22  ~ a21 

.90-60 

.30  2 

an +a22- an-a2i 

.85+  90-  70- 

.60 

.45  3 

O 22-On 

.90-70 

.20  4 

<*11  +022  “<*12  “<*21 

.85+  90-  70- 

.60  ' 

.45  9 

i *i  4 _ 5 

1 1 9 9 

o\\022-o\202\ 

(-85)  (.90)  — (.70)  (.60) 

.345 

a\\  +<322  — a\2  — 021  .85  + .90  — .70  - .60  .45 


citizens  with  vaccine  2.  This  will  guarantee  that  about  76.7%  of  the  citizens  will  be  resistant  to  a virus 
attack  regardless  of  the  distribution  of  the  two  strains. 


In  contrast,  a virus  distribution  of  4 of  strain  1 and  4-  of  strain  2 will  result  in  the  same  76.7%  of  resistant 

9 9 

citizens,  regardless  of  the  inoculation  strategy  adopted  by  the  government  (see  Exercise  7). 


Exercise  Set  10.7 


6 

-7 

0 


-4  1 

3 8 

6 -2 


1.  Suppose  that  a game  has  a payoff  matrix 


(a)  If  players  R and  C use  strategies 


respectively,  what  is  the  expected  payoff  of  the  game? 

(b)  If  player  C keeps  his  strategy  fixed  as  in  part  (a),  what  strategy  should  player  R choose  to  maximize  his  expected 
payoff? 

(c)  If  player  R keeps  her  strategy  fixed  as  in  part  (a),  what  strategy  should  player  C choose  to  minimize  the  expected 
payoff  to  player  R7 

Answer: 


(a)  “5/8 

(b)  [0  1 0] 

(c)  [1  0 0 0]r 

2.  Construct  a simple  example  to  show  that  optimal  strategies  are  not  necessarily  unique.  For  example,  find  a payoff 
matrix  with  several  equal  saddle  points. 

Answer: 

Let  A = | | , for  example. 

3.  For  the  strictly  determined  games  with  the  following  payoff  matrices,  find  optimal  strategies  for  the  two  players,  and 
find  the  values  of  the  games. 

(a)  [5  2 

1 3_ 

(b)  f— 3 -2" 

2 4 

-4  1 

(c)  2 —2  0 

-6  0 -5 

5 2 3 

(d)  [-3  2 -f 

-2  -1  5 

-4  1 0 

-3  4 6 


Answer: 


(a)  P*  = [0  1],  q*=  J , v = 3 


(b)  P*=  [0  1 0],  q*  = * , v = 2 

(c)  [O' 

p*=  [0  0 1],  q*=  1 , v = 2 
0 

w * * rr 

p = [0100],  q = 0,  v = — 2 

0 


4.  For  the  2 x 2 games  with  the  following  payoff  matrices,  find  optimal  strategies  for  the  two  players,  and  find  the 
values  of  the  games. 

(a)  [ 6 3 

-1  4 


(b)  [ 40  20' 
-10  30  _ 


(d)  [3  5' 
5 2 


Answer: 


5.  Player  R has  two  playing  cards:  a black  ace  and  a red  four.  Player  C also  has  two  cards:  a black  two  and  a red  three. 
Each  player  secretly  selects  one  of  his  or  her  cards.  If  both  selected  cards  are  the  same  color,  player  C pays  player  R 
the  sum  of  the  face  values  in  dollars.  If  the  cards  are  different  colors,  player  R pays  player  C the  sum  of  the  face 
values.  What  are  optimal  strategies  for  both  players,  and  what  is  the  value  of  the  game? 


Answer: 


11 

20 


3_ 

20 


20 


6.  Verify  Equations  6,  7,  and  8. 

7.  Verify  the  statement  in  the  last  paragraph  of  Example  3. 

* * 

8.  Show  that  the  entries  of  the  optimal  strategies  p and  q given  in  Theorem  10.7.2  are  numbers  strictly  between  zero 


and  one. 


Technology  Exercises 


The  following  exercises  are  designed  to  be  solved  using  a technology  utility.  Typically,  this  will  be  MATLAB, 
Mathematica , Maple,  Derive,  or  Mathcad,  but  it  may  also  be  some  other  type  of  linear  algebra  software  or  a scientific 
calculator  with  some  linear  algebra  capabilities.  For  each  exercise  you  will  need  to  read  the  relevant  documentation  for 
the  particular  utility  you  are  using.  The  goal  of  these  exercises  is  to  provide  you  with  a basic  proficiency  with  your 
technology  utility.  Once  you  have  mastered  the  techniques  in  these  exercises,  you  will  be  able  to  use  your  technology 
utility  to  solve  many  of  the  problems  in  the  regular  exercise  sets. 

Tl.  Consider  a game  between  two  players  where  each  player  can  make  up  to  n different  moves  («  > 1 ) . If  the  ith  move 
of  player  R and  the  j th  move  of  player  C are  such  that  i + j is  even,  then  C pays  R $ 1 . If  i + j is  odd,  then  R pays  C$1. 
Assume  that  both  players  have  the  same  strategy — that  is,  p„  = [p2  ] and  q„  = [pi]nx\,  where 
p\  + P2  + P2  + - - - + pyi  = 1 • Use  a computer  to  show  that 


make  the  same  move,  then  player  C pays  player  R $ (n  — 1) . However,  if  both  players  make  different  moves,  then 
player  R pays  player  C$1.  Assume  that  both  players  have  the  same  strategy — that  is,  p„  = [p2  ] j XM  and  q„  = [p2  ] n>  \ , 
where  pi  + p2  + pi  + . . . + pn  = 1 . Use  a computer  to  show  that 


£(P2,q2)  =(.P\~P2)2 

£(P3.  Q3)  =(P1-P2+P3)2 

£(P4,  Q4)  = (PI  -P2+P2-P4)2 

£(P 5.95)  = Ol  -P2+P3-P4  + P5)2 


Using  these  results  as  a guide,  prove  in  general  that  the  expected  payoff  to  player  R is 


which  shows  that  in  the  long  run,  player  R will  not  lose  in  this  game. 

T2.  Consider  a game  between  two  players  where  each  player  can  make  up  to  n different  moves  {n>  1) . If  both  players 


£(P2.  <12)  = ^Ol  ~P\)2  + j(Pl  -P2 )2  + ^(P2~P\)2 
+^0  2~  Pi)2 

j5“Cp3.  q3)  = pi  -p\)2  + ^{p\  -pi)2  + jOi  -pi)2 
+^0>2  ~P\)2  + ^(P2  -P2)2  + ^{P2 -Pi)2 
+^•(>3  -P\)2  + ^(>3  -P2)2  + ^(P3  -Pi)2 
^(P4.  q4)  = -^Oi  -p\)2  + ^(p\  -pi)2  + '2^1  -P3)2 
+^0l  -P4)2  + ^(P2-Pl)2  + ^(P2-P2)2 
+^(/>2  -P3)2  + ^(P2  -P4)2  + j(P3  “Pi)2 
0>3  “ P2)  2 + J (P2  - Pi)  2 + J (P3  “ P4)  2 
+J  (P4  “ PI ) 2 + ^ 0>4  “ P2)  2 + ^ (P4  - P3)  2 
+-^(P4-P4)2 

Using  these  results  as  a guide,  prove  in  general  that  the  expected  payoff  to  player  R is 

E(Vn.  q«)4EE(A-p/)2>0 

2i=l/=l  ' 

which  shows  that  in  the  long  run,  player  R will  not  lose  in  this  game. 
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10.8  Leontief  Economic  Models 

In  this  section  we  discuss  two  linear  models  for  economic  systems.  Some  results  about  nonnegative  matrices  are  applied  to  determine 
equilibrium  price  structures  and  outputs  necessary  to  satisfy  demand. 


Prerequisites 

Linear  Systems 
Matrices 


Economic  Systems 

Matrix  theory  has  been  very  successful  in  describing  the  interrelations  among  prices,  outputs,  and  demands  in  economic  systems.  In 
this  section  we  discuss  some  simple  models  based  on  the  ideas  of  Nobel  laureate  Wassily  Leontief.  We  examine  two  different  but 
related  models:  the  closed  or  input-output  model,  and  the  open  or  production  model.  In  each,  we  are  given  certain  economic 
parameters  that  describe  the  interrelations  between  the  “industries”  in  the  economy  under  consideration.  Using  matrix  theory,  we  then 
evaluate  certain  other  parameters,  such  as  prices  or  output  levels,  in  order  to  satisfy  a desired  economic  objective.  We  begin  with  the 
closed  model. 


Leontief  Closed  (Input-Output)  Model 

First  we  present  a simple  example;  then  we  proceed  to  the  general  theory  of  the  model. 

EXAMPLE  1 An  Input-Output  Model 

Three  homeowners — a carpenter,  an  electrician,  and  a plumber — agree  to  make  repairs  in  their  three  homes.  They  agree 
to  work  a total  of  10  days  each  according  to  the  following  schedule: 


Work  Performed  by 

Carpenter 

Electrician 

Plumber 

Days  of  Work  in  Home  of  Carpenter 

o 

1 

6 

Days  of  Work  in  Home  of  Electrician 

4 

5 

1 

Days  of  Work  in  Home  of  Plumber 

4 

4 

3 

For  tax  purposes,  they  must  report  and  pay  each  other  a reasonable  daily  wage,  even  for  the  work  each  does  on  his  or  her 
own  home.  Their  normal  daily  wages  are  about  $100,  but  they  agree  to  adjust  their  respective  daily  wages  so  that  each 
homeowner  will  come  out  even — that  is,  so  that  the  total  amount  paid  out  by  each  is  the  same  as  the  total  amount  each 
receives.  We  can  set 

p\  = daily  wage  of  carpenter 
P2  = daily  wage  of  electrician 
P2  = daily  wage  of  plumber 

To  satisfy  the  “equilibrium”  condition  that  each  homeowner  comes  out  even,  we  require  that 

total  expenditures  = total  income 

for  each  of  the  homeowners  for  the  10-day  period.  For  example,  the  carpenter  pays  a total  of  2p\  4-  P2  + &P3  for  the 


repairs  in  his  own  home  and  receives  a total  income  of  KDpi  for  the  repairs  that  he  performs  on  all  three  homes. 

Equating  these  two  expressions  then  gives  the  first  of  the  following  three  equations: 

2pi  + p 2 + 6p  3 = 10^1 

4pi  + 5p2  + P3  = 10/>2 

4p\  + 4p2  + 3p3  = 10^3 

The  remaining  two  equations  are  the  equilibrium  equations  for  the  electrician  and  the  plumber.  Dividing  these  equations 
by  1 0 and  rewriting  them  in  matrix  form  yields 

".2  .1  .6lpll  [ P\ 

A .5  .1  P2  = P2  (1) 

.4  .4  .3  P3  P3 

Equation  1 can  be  rewritten  as  a homogeneous  system  by  subtracting  the  left  side  from  the  right  side  to  obtain 

.8  -.1  -,6ir  Pll  |"0" 

-.4  .5  -.1  P2  = 0 

—.4  —.4  .7  J |_^3  J |_0 

The  solution  of  this  homogeneous  system  is  found  to  be  (verify) 

~P  ll  [31" 

P2  = s 32 
_P3\  |_36_ 

where  s is  an  arbitrary  constant.  This  constant  is  a scale  factor,  which  the  homeowners  may  choose  for  their 
convenience.  For  example,  they  can  set  s = 3 so  that  the  corresponding  daily  wages — $93,  $96,  and  $108 — are  about 
$100. 


This  example  illustrates  the  salient  features  of  the  Leontief  input-output  model  of  a closed  economy.  In  the  basic  Equation  1 , each 
column  sum  of  the  coefficient  matrix  is  1,  corresponding  to  the  fact  that  each  of  the  homeowners'  “output”  of  labor  is  completely 
distributed  among  these  same  homeowners  in  the  proportions  given  by  the  entries  in  the  column.  Our  problem  is  to  determine  suitable 
“prices”  for  these  outputs  so  as  to  put  the  system  in  equilibrium — that  is,  so  that  each  homeowner's  total  expenditures  equal  his  or  her 
total  income. 

In  the  general  model  we  have  an  economic  system  consisting  of  a finite  number  of  “industries,”  which  we  number  as  industries 
1,  2, k.  Over  some  fixed  period  of  time,  each  industry  produces  an  “output”  of  some  good  or  service  that  is  completely  utilized  in  a 
predetermined  manner  by  the  k industries.  An  important  problem  is  to  find  suitable  “prices”  to  be  charged  for  these  k outputs  so  that 
for  each  industry,  total  expenditures  equal  total  income.  Such  a price  structure  represents  an  equilibrium  position  for  the  economy. 

For  the  fixed  time  period  in  question,  let  us  set 

Pi  = price  charged  by  the  zth  industiy  for  its  total  output 

= fraction  of  the  total  output  of  the  yth  industry  purchased  by  the  zth  industry 

for  j,  j = 1,  2, ...,  k.  By  definition,  we  have 

(i)  Pi>®, 

(ii)  €ij  > 0,  i,  j = 1,  2 k 

(in)  + + — + «*/ = 1*  j=  2,...,k 

With  these  quantities,  we  form  the  price  vector 

>1 

P2 

P = 

Pk 


and  the  exchange  matrix  or  input-output  matrix 


«?11 

f?12  - 

■■  «lfc 

«?21 

e22  - 

--  22k 

<*k  1 

<?k2  ■ 

-•  ekk 

Condition  (iii)  expresses  the  fact  that  all  the  column  sums  of  the  exchange  matrix  are  1 . 

As  in  the  example,  in  order  that  the  expenditures  of  each  industry  be  equal  to  its  income,  the  following  matrix  equation  must  be 
satisfied  [see  1]: 


£p  = p (2) 

or 

(/-S)p  = 0 (3) 

Equation  3 is  a homogeneous  linear  system  for  the  price  vector  p.  It  will  have  a nontrivial  solution  if  and  only  if  the  determinant  of  its 
coefficient  matrix  / _ £ is  zero.  In  Exercise  7 we  ask  you  to  show  that  this  is  the  case  for  any  exchange  matrix  E.  Thus,  3 always  has 
nontrivial  solutions  for  the  price  vector  p. 

Actually,  for  our  economic  model  to  make  sense,  we  need  more  than  just  the  fact  that  3 has  nontrivial  solutions  for  p.  We  also  need  the 
prices  Pi  of  the  k outputs  to  be  nonnegative  numbers.  We  express  this  condition  as  p > 0.  (In  general,  if  A is  any  vector  or  matrix,  the 
notation  A > 0 means  that  every  entry  of  A is  nonnegative,  and  the  notation  A > 0 means  that  every  entry  of  A is  positive.  Similarly, 

A > B means  A — B > 0,  and  A > B means  A — B > 0-)  To  show  that  3 has  a nontrivial  solution  for  which  p > 0 is  a bit  more  difficult 
than  showing  merely  that  some  nontrivial  solution  exists.  But  it  is  true,  and  we  state  this  fact  without  proof  in  the  following  theorem. 


THEOREM  10.8.1 

If  E is  an  exchange  matrix,  then  £p  = p always  has  a nontrivial  solution  p whose  entries  are  nonnegative. 


Let  us  consider  a few  simple  examples  of  this  theorem. 

EXAMPLE  2 Using  Theorem  10.8.1 


Let 


Then  (/  — E)  p = 0 is 


which  has  the  general  solution 


0 

1 


P 


where  5 is  an  arbitrary  constant.  We  then  have  nontrivial  solutions  p > 0 for  any  $ > Q. 


EXAMPLE  3 Using  Theorem  10.8.1 


Then  {1  — E)  p = 0 has  the  general  solution 


where  s and  t are  independent  arbitrary  constants.  Nontrivial  solutions  p > 0 then  result  from  any  s > 0 and  t > 0,  not 
both  zero. 


Example  2 indicates  that  in  some  situations  one  of  the  prices  must  be  zero  in  order  to  satisfy  the  equilibrium  condition.  Example  3 
indicates  that  there  may  be  several  linearly  independent  price  structures  available.  Neither  of  these  situations  describes  a truly 
interdependent  economic  structure.  The  following  theorem  gives  sufficient  conditions  for  both  cases  to  be  excluded. 


THEOREM  10.8.2 

Let  E be  an  exchange  matrix  such  that  for  some  positive  integer  m all  the  entries  of  Em  are  positive.  Then  there  is  exactly  one 
linearly  independent  solution  of  (/  — 5)p  = 0,  and  it  may  be  chosen  so  that  all  its  entries  are  positive. 


We  will  not  give  a proof  of  this  theorem.  If  you  have  read  Section  10.5  on  Markov  chains,  observe  that  this  theorem  is  essentially  the 
same  as  Theorem  10.5.4.  What  we  are  calling  exchange  matrices  in  this  section  were  called  stochastic  or  Markov  matrices  in  Section 


10.5. 


EXAMPLE  4 Using  Theorem  10.8.2 


The  exchange  matrix  in  Example  1 was 


.1  .6 
.5  .1 
.4  .3 


Because  g > 0,  the  condition  Em  > 0 in  Theorem  10.8.2  is  satisfied  for  m = ].  Consequently,  we  are  guaranteed  that 
there  is  exactly  one  linearly  independent  solution  of  (/  — E)p  = 0,  and  it  can  be  chosen  so  that  p > 0.  In  that  example, 
we  found  that 


P = 


31 

32 
36 


is  such  a solution. 


Leontief  Open  (Production)  Model 

In  contrast  with  the  closed  model,  in  which  the  outputs  of  k industries  are  distributed  only  among  themselves,  the  open  model  attempts 
to  satisfy  an  outside  demand  for  the  outputs.  Portions  of  these  outputs  can  still  be  distributed  among  the  industries  themselves,  to  keep 
them  operating,  but  there  is  to  be  some  excess,  some  net  production,  with  which  to  satisfy  the  outside  demand.  In  the  closed  model  the 
outputs  of  the  industries  are  fixed,  and  our  objective  is  to  determine  prices  for  these  outputs  so  that  the  equilibrium  condition,  that 
expenditures  equal  incomes,  is  satisfied.  In  the  open  model  it  is  the  prices  that  are  fixed,  and  our  objective  is  to  determine  levels  of  the 
outputs  of  the  industries  needed  to  satisfy  the  outside  demand.  We  will  measure  the  levels  of  the  outputs  in  terms  of  their  economic 
values  using  the  fixed  prices.  To  be  precise,  over  some  fixed  period  of  time,  let 


Xj  = monetary  value  of  the  total  output  of  the  zth  industry 

d 2 = monetary  value  of  the  output  of  the  zth  industry  needed  to  satisfy  the  outside  demand 

Cjj  = monetary  value  of  the  output  of  the  ith  industry  needed  by  the  yth  industry  to  produce  one  unit  of  monetary  value  of  its  own  output 


With  these  quantities,  we  define  the  production  vector 


the  demand  vector 


and  the  consumption  matrix 


By  their  nature,  we  have  that 


d = 


*1 

*2 

xk 

di 

d2 

dfc 


x>  0, 


"<Hi 

C12 

•••  ci  fc 

c = 

c2\ 

c22 

•••  C2k 

■■■  ckk 

d>  0, 

and 

C>  0 


From  the  definition  of  cij  and  it  can  be  seen  that  the  quantity 

W1+W  + - + W 

is  the  value  of  the  output  of  the  z'th  industry  needed  by  all  k industries  to  produce  a total  output  specified  by  the  production  vector  x. 
Because  this  quantity  is  simply  the  zth  entry  of  the  column  vector  (7x>  we  can  say  further  that  the  z'th  entry  of  the  column  vector 

x — Cx 

is  the  value  of  the  excess  output  of  the  z'th  industry  available  to  satisfy  the  outside  demand.  The  value  of  the  outside  demand  for  the 
output  of  the  z'th  industry  is  the  z'th  entry  of  the  demand  vector  d.  Consequently,  we  are  led  to  the  following  equation 

x — Cx  = d 


or 


(1-C)x=  d (4) 

for  the  demand  to  be  exactly  met,  without  any  surpluses  or  shortages.  Thus,  given  C and  d,  our  objective  is  to  find  a production  vector 
x > 0 that  satisfies  Equation  4. 

EXAMPLE  5 Production  Vector  for  a Town 

A town  has  three  main  industries:  a coal-mining  operation,  an  electric  power-generating  plant,  and  a local  railroad.  To 
mine  $1  of  coal,  the  mining  operation  must  purchase  $.25  of  electricity  to  run  its  equipment  and  $.25  of  transportation 
for  its  shipping  needs.  To  produce  $1  of  electricity,  the  generating  plant  requires  $.65  of  coal  for  fuel,  $.05  of  its  own 
electricity  to  run  auxiliary  equipment,  and  $.05  of  transportation.  To  provide  $1  of  transportation,  the  railroad  requires 
$.55  of  coal  for  fuel  and  $.10  of  electricity  for  its  auxiliary  equipment.  In  a certain  week  the  coal-mining  operation 
receives  orders  for  $50,000  of  coal  from  outside  the  town,  and  the  generating  plant  receives  orders  for  $25,000  of 
electricity  from  outside.  There  is  no  outside  demand  for  the  local  railroad.  How  much  must  each  of  the  three  industries 
produce  in  that  week  to  exactly  satisfy  their  own  demand  and  the  outside  demand? 

For  the  one-week  period  let 

x\  = value  of  total  output  of  coal-mining  operation 
X2  = value  of  total  output  of  power-generating  plant 
*3  = value  of  total  output  of  local  railroad 
From  the  information  supplied,  the  consumption  matrix  of  the  system  is 


c = 


0 .65  .55 
.25  .05  .10 
.25  .05  0 

The  linear  system  (/  — C)x  = d is  then 


" 1.00 

-.65  - 55“ 

"*r 

'50,  000' 

-.25 

.95  -.10 

*2 

= 

25,  000 

-.25 

-.05  1.00 

*3 

0 

The  coefficient  matrix  on  the  left  is  invertible,  and  the  solution  is  given  by 


J‘=(/-cr‘d=w 


'756  542  470" 

'50, 000" 

'102,  087' 

220  690  190 

25,  000 

= 

56,  163 

200  170  630 

0 

28,  330 

Thus,  the  total  output  of  the  coal-mining  operation  should  be  $102,087,  the  total  output  of  the  power-generating  plant 
should  be  $56,163,  and  the  total  output  of  the  railroad  should  be  $28,330. 


Let  us  reconsider  Equation  4: 

If  the  square  matrix  / _ Q is  invertible,  we  can  write 


(/  — C)x  = d 


x=  (/  — C)  -1d 


(5) 


In  addition,  if  the  matrix  (/  — C)  1 has  only  nonnegative  entries,  then  we  are  guaranteed  that  for  any  d > 0,  Equation  5 has  a unique 

nonnegative  solution  for  x.  This  is  a particularly  desirable  situation,  as  it  means  that  any  outside  demand  can  be  met.  The  terminology 
used  to  describe  this  case  is  given  in  the  following  definition. 


DEFINITION  1 

A consumption  matrix  C is  said  to  be  productive  if  (/  — C)  exists  and 

(/-C)_1>  0 


J 


We  will  now  consider  some  simple  criteria  that  guarantee  that  a consumption  matrix  is  productive.  The  first  is  given  in  the  following 
theorem. 


Productive  Consumption  Matrix 

A consumption  matrix  C is  productive  if  and  only  if  there  is  some  production  vector  x > 0 such  that  x > Cx- 


(The  proof  is  outlined  in  Exercise  9.)  The  condition  x > Cx  means  that  there  is  some  production  schedule  possible  such  that  each 
industry  produces  more  than  it  consumes. 

Theorem  10.8.3  has  two  interesting  corollaries.  Suppose  that  all  the  row  sums  of  C are  less  than  1.  If 

f 

1 

1 


x = 


then  Cx  is  a column  vector  whose  entries  are  these  row  sums.  Therefore,  x > Cx>  and  the  condition  of  Theorem  10.8.3  is  satisfied. 
Thus,  we  arrive  at  the  following  corollary: 


COROLLARY  10.8.4 

A consumption  matrix  is  productive  if  each  of  its  row  sums  is  less  than  1 . 


As  we  ask  you  to  show  in  Exercise  8,  this  corollary  leads  to  the  following: 


COROLLARY  10.8.5 

A consumption  matrix  is  productive  if  each  of  its  column  sums  is  less  than  1 . 


Recalling  the  definition  of  the  entries  of  the  consumption  matrix  C,  we  see  that  the  y'th  column  sum  of  C is  the  total  value  of  the  outputs 
of  all  k industries  needed  to  produce  one  unit  of  value  of  output  of  the y'th  industry.  The y'th  industry  is  thus  said  to  be  profitable  if  that 
y'th  column  sum  is  less  than  1.  In  other  words,  Corollary  10.8.5  says  that  a consumption  matrix  is  productive  if  all  k industries  in  the 
economic  system  are  profitable. 

EXAMPLE  6 Using  Corollary  10.8.5 


The  consumption  matrix  in  Example  5 was 


.65 

.05 

.05 


.55 

.10 

0 


All  three  column  sums  in  this  matrix  are  less  than  1,  so  all  three  industries  are  profitable.  Consequently,  by  Corollary 
10.8.5,  the  consumption  matrix  C is  productive.  This  can  also  be  seen  in  the  calculations  in  Example  5,  as  (/  — C)  is 
nonnegative. 


Exercise  Set  10.8 


1.  For  the  following  exchange  matrices,  find  nonnegative  price  vectors  that  satisfy  the  equilibrium  condition  3. 


(a) 


(b) 


(c) 


1 1 

2 3 

1 2 

2 3 


1 

0 

1 

2 

2 

1 

0 

1 

3 

2 

1 

1 

0 

6 

.35 

50 

.30 

.25 

20 

.30 

.40 

30 

.40 

Answer: 


(a)  2 

_3 

(b)  f 6' 

5 

_6_ 

(c)  78 
54 
79 

2.  Using  Theorem  10.8.3  and  its  corollaries,  show  that  each  of  the  following  consumption  matrices  is  productive. 

(a)  r.8  r 

.3  .6_ 

(b)  [.70  .30  .25" 

.20  .40  .25 
.05  .15  .25 

(c)  r 7 .3  .2" 

.1  .4  .3 

.2  A .1_ 

Answer: 


Answer: 

g'*  has  all  positive  entries. 

4.  Three  neighbors  have  backyard  vegetable  gardens.  Neighbor^  grows  tomatoes,  neighbor  B grows  corn,  and  neighbor  C grows 
lettuce.  They  agree  to  divide  their  crops  among  themselves  as  follows:  A gets  X of  the  tomatoes,  y of  the  corn,  and  -j-  of  the 

lettuce.  B gets  X of  the  tomatoes,  X of  the  corn,  and  X of  the  lettuce.  C gets  X of  the  tomatoes,  X of  the  corn,  X of  the  lettuce. 

5 3 3 4 5 6 3 2 

What  prices  should  the  neighbors  assign  to  their  respective  crops  if  the  equilibrium  condition  of  a closed  economy  is  to  be  satisfied, 
and  if  the  lowest-priced  crop  is  to  have  a price  of  $100? 


Answer: 


Price  of  tomatoes,  $120.00;  price  of  corn,  $100.00;  price  of  lettuce,  $106.67 

5.  Three  engineers — a civil  engineer  (CE),  an  electrical  engineer  (EE),  and  a mechanical  engineer  (ME) — each  have  a consulting  firm. 
The  consulting  they  do  is  of  a multidisciplinary  nature,  so  they  buy  a portion  of  each  others'  services.  For  each  $1  of  consulting  the 
CE  does,  she  buys  $.10  of  the  EE's  services  and  $.30  of  the  ME's  services.  For  each  $1  of  consulting  the  EE  does,  she  buys  $.20  of 
the  CE's  services  and  $.40  of  the  ME's  services.  And  for  each  $1  of  consulting  the  ME  does,  she  buys  $.30  of  the  CE's  services  and 
$.40  of  the  EE's  services.  In  a certain  week  the  CE  receives  outside  consulting  orders  of  $500,  the  EE  receives  outside  consulting 
orders  of  $700,  and  the  ME  receives  outside  consulting  orders  of  $600.  What  dollar  amount  of  consulting  does  each  engineer 
perform  in  that  week? 


Answer: 


$1256  for  the  CE,  $1448  for  the  EE,  $1556  for  the  ME 


(a)  Suppose  that  the  demand  di  for  the  output  of  the  zth  industry  increases  by  one  unit.  Explain  why  the  zth  column  of  the  matrix 
(/  — C)  is  the  increase  that  must  be  made  to  the  production  vector  x to  satisfy  this  additional  demand. 

(b)  Referring  to  Example  5,  use  the  result  in  part  (a)  to  determine  the  increase  in  the  value  of  the  output  of  the  coal-mining 
operation  needed  to  satisfy  a demand  of  one  additional  unit  in  the  value  of  the  output  of  the  power-generating  plant. 


Answer: 


(b) 


542 

503 


7.  Using  the  fact  that  the  column  sums  of  an  exchange  matrix  E are  all  1,  show  that  the  column  sums  of  / — g are  zero.  From  this, 
show  that  / — E has  zero  determinant,  and  so  (/  — £)p  = 0 has  nontrivial  solutions  for  p. 

8.  Show  that  Corollary  10.8.5  follows  from  Corollary  10.8.4. 

[Hint:  Use  the  fact  that  = ”1 ) for  any  invertible  matrix  A.\ 


9.  (Calculus  required)  Prove  Theorem  10.8.3  as  follows: 

(a)  Prove  the  “only  if’  part  of  the  theorem;  that  is,  show  that  if  C is  a productive  consumption  matrix,  then  there  is  a vector  x > 0 
such  that  x > Cx- 

(b)  Prove  the  “if’  part  of  the  theorem  as  follows: 

Step  1 Show  that  if  there  is  a vector  x*  > 0 such  that  Cx*  < x*>  then  x*  > 0- 
Step  2 Show  that  there  is  a number  X such  that  0 < A < 1 and  Cx*  < Ax*- 
Step  3 Show  that  C”x*  < A”x*  for  n = 1,  2, .... 

Step  4 Show  that  CM  — ► 0 as  ^ oo- 
Step  5 By  multiplying  out,  show  that 

(/-C)(/  + C + C2  + ...  + C"_1)=/-C'" 

for  n = 1,  2, .... 

Step  6 By  letting  « — ► oo  in  Step  5,  show  that  the  matrix  infinite  sum 

s=/  + c + c2  + ... 

exists  and  that  (1  — C)S  = I. 

Step  7 Show  that  S > 0 and  that  S = (1  — C)  -1 . 

Step  8 Show  that  C is  a productive  consumption  matrix. 


Technology  Exercises 


The  following  exercises  are  designed  to  be  solved  using  a technology  utility.  Typically,  this  will  be  MAT  LAB,  Mathematical  Maple, 
Derive,  or  Mathcad,  but  it  may  also  be  some  other  type  of  linear  algebra  software  or  a scientific  calculator  with  some  linear  algebra 
capabilities.  For  each  exercise  you  will  need  to  read  the  relevant  documentation  for  the  particular  utility  you  are  using.  The  goal  of 
these  exercises  is  to  provide  you  with  a basic  proficiency  with  your  technology  utility.  Once  you  have  mastered  the  techniques  in 
these  exercises,  you  will  be  able  to  use  your  technology  utility  to  solve  many  of  the  problems  in  the  regular  exercise  sets. 

Tl.  Consider  a sequence  of  exchange  matrices  { , £3,  £4,  ...,  En)  , where 


S2  = 


0 4 


1 4 


0 4 


B 3 = 


1 0 4 


0 4 4 


B 4 = 


1 

2 

o 4 4 


4 o 


o 4 4 


b5  = 


1 

2 

o 4 


4o44 


4o4 


o44 


and  so  on.  Use  a computer  to  show  that  E^  > 02’  £3  > 03.  £4  > 04.  £j  > O5,  and  make  the  conjecture  that  although  g”  > 0„  is  true, 

B»  > 0n  is  not  true  for  k = 1,  2,  3 » — J.  Next,  use  a computer  to  determine  the  vectors  P « such  that  £„p„  = p„  (for  n = 2,  3, 4, 

5,  6),  and  then  see  if  you  can  discover  a pattern  that  would  allow  you  to  compute  P«+l  easily  from  p«.  Test  your  discovery  by  first 
constructing  P8  from 

r2520~ 

3360 
1890 
P7  = | 672 
175 
36 
7 

L 

and  then  checking  to  see  whether  £gpg  = pg. 

T2.  Consider  an  open  production  model  having  n industries  with  « > 1 . In  order  to  produce  $1  of  its  own  output,  the  /th  industry  must 

spend  $(1  / «)  for  the  output  of  the  /th  industry  (for  all  i * j),  but  the  /th  industry  (for  all  j = 1,  2,  3 n)  spends  nothing  for  its  own 

output.  Construct  the  consumption  matrix  Cn,  show  that  it  is  productive,  and  determine  an  expression  for  _ Qn)  • In 
determining  an  expression  for  _ C„)  , use  a computer  to  study  the  cases  when  « = 2,  3,  4,  and  5;  then  make  a conjecture  and 

prove  your  conjecture  to  be  true.  [Hint:  If Fn  = [ 1 ] nxn  (i.e.,  the  nxn  matrix  with  every  entry  equal  to  1),  first  show  that 


F~i  = nFy 


and  then  express  your  value  of  (/  _ C„)  * in  terms  of  n,  /„,  and  Fn. 
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10.9  Forest  Management 

In  this  section  we  discuss  a matrix  model  for  the  management  of  a forest  where  trees  are  grouped  into  classes  according  to  height. 
The  optimal  sustainable  yield  of  a periodic  harvest  is  calculated  when  the  trees  of  different  height  classes  can  have  different 
economic  values. 


Prerequisites 

Matrix  Operations 


Optimal  Sustainable  Yield 

Our  objective  is  to  introduce  a simplified  model  for  the  sustainable  harvesting  of  a forest  whose  trees  are  classified  by  height.  The 
height  of  a tree  is  assumed  to  determine  its  economic  value  when  it  is  cut  down  and  sold.  Initially,  there  is  a distribution  of  trees 
of  various  heights.  The  forest  is  then  allowed  to  grow  for  a certain  period  of  time,  after  which  some  of  the  trees  of  various  heights 
are  harvested.  The  trees  left  unharvested  are  to  be  of  the  same  height  configuration  as  the  original  forest,  so  that  the  harvest  is 
sustainable.  As  we  will  see,  there  are  many  such  sustainable  harvesting  procedures.  We  want  to  find  one  for  which  the  total 
economic  value  of  all  the  trees  removed  is  as  large  as  possible.  This  determines  the  optimal  sustainable  yield  of  the  forest  and  is 
the  largest  yield  that  can  be  attained  continually  without  depleting  the  forest. 


The  Model 

Suppose  that  a harvester  has  a forest  of  Douglas  fir  trees  that  are  to  be  sold  as  Christmas  trees  year  after  year.  Every  December 
the  harvester  cuts  down  some  of  the  trees  to  be  sold.  For  each  tree  cut  down,  a seedling  is  planted  in  its  place.  In  this  way  the  total 
number  of  trees  in  the  forest  is  always  the  same.  (In  this  simplified  model,  we  will  not  take  into  account  trees  that  die  between 
harvests.  We  assume  that  every  seedling  planted  survives  and  grows  until  it  is  harvested.) 

In  the  marketplace,  trees  of  different  heights  have  different  economic  values.  Suppose  that  there  are  n different  price  classes 
corresponding  to  certain  height  intervals,  as  shown  in  Table  1 and  Figure  10. 9.1. The  first  class  consists  of  seedlings  with  heights 
in  the  interval  [0,  k\),  and  these  seedlings  are  of  no  economic  value.  The  ftth  class  consists  of  trees  with  heights  greater  than  or 
equal  to 


Figure  10.9.1 


Table  1 


Class 

Value  (dollars) 

Height  Interval 

1 (seedlings) 

None 

10./.,) 

2 

P2 

t hvh2) 

3 

Pi 

• 

• 

• 

n-  1 

p„~  i 

t^n-21  ^ii-O 

n 

tW) 

Let  Xj  (i  = 1,  2, be  the  number  of  trees  within  the  zth  class  that  remain  after  each  harvest.  We  form  a column  vector  with 
the  numbers  and  call  it  the  nonharvest  vector. 

■*r 


For  a sustainable  harvesting  policy,  the  forest  is  to  be  returned  after  each  harvest  to  the  fixed  configuration  given  by  the 
nonharvest  vector  x.  Part  of  our  problem  is  to  find  those  nonharvest  vectors  x for  which  sustainable  harvesting  is  possible. 


Because  the  total  number  of  trees  in  the  forest  is  fixed,  we  can  set 


*1  4 *2  4 * ‘ * 4*M  =s 


(1) 


where  s is  predetermined  by  the  amount  of  land  available  and  the  amount  of  space  each  tree  requires.  Referring  to  Figure  10.9.2, 
we  have  the  following  situation.  The  forest  configuration  is  given  by  the  vector  x after  each  harvest.  Between  harvests  the  trees 
grow  and  produce  a new  forest  configuration  before  each  harvest.  A certain  number  of  trees  are  removed  from  each  class  at  the 
harvest.  Finally,  a seedling  is  planted  in  place  of  each  tree  removed,  to  return  the  forest  again  to  the  configuration  x. 


•5 

i 

o 


Forest  after  growth 


Trees  not  removed 


mk 


TT 


Forest  before  growth 
(nonharvest  vector  x) 


Same 

forest 

configuration 


Forest  after  harvest 
{nonharvest  vector  x) 


-i. 

Li 


1 


tA 


Figure  10.9.2 


Consider  first  the  growth  of  the  forest  between  harvests.  During  this  period  a tree  in  the  zth  class  may  grow  and  move  up  to  a 


higher  height  class.  Or  its  growth  may  be  retarded  for  some  reason,  and  it  will  remain  in  the  same  class.  We  consequently  define 
the  following  growth  parameters  gj  for  i = 1,  2, — 1 : 

gj  = the  fraction  of  trees  in  the  ith  class  that  grow  into  the(z  + 1 ) -st  class  during  a growth  period 


For  simplicity  we  assume  that  a tree  can  move  at  most  one  height  class  upward  in  one  growth  period.  With  this  assumption,  we 
have 


With  these  n 


1 — gi  = the  fraction  of  trees  in  the  it h class  that  remain  in  the  ith  class  during  a growth  period 
growth  parameters,  we  form  the  following  ^ x n growth  matrix : 


1-gl  0 0 

g\  1-S2  0 

0 §2  l-g3 


0 

0 

0 


0 0 0 • • • 1 -g„_  1 0 

0 0 0 • • • g„_i  1 


(2) 


Because  the  entries  of  the  vector  x are  the  numbers  of  trees  in  the  n classes  before  the  growth  period,  you  can  verify  that  the 
entries  of  the  vector 


o -gi)*i 

gl*l  + (1  -g2)x2 
g2*2  + (1  — S2)x2 

2xn—2  + O — Sn—l')xn—l 
Sn—lxn—l  + xn 


(3) 


are  the  numbers  of  trees  in  the  n classes  after  the  growth  period. 


Suppose  that  during  the  harvest  we  remove  (i  = 1,  2, n)  trees  from  the  zth  class.  We  will  call  the  column  vector 

r>r 


y= 


y 2 

yn 


the  harvest  vector.  Thus,  a total  of 

y\  +72+  • • • +7h 

trees  are  removed  at  each  harvest.  This  is  also  the  total  number  of  trees  added  to  the  first  class  (the  new  seedlings)  after  each 
harvest.  If  we  define  the  following  « x « replacement  matrix 

\ 1 • • • 1 
0 0 • • • 0 


R = 


0 0 


(4) 


then  the  column  vector 


Ry= 


71  +72  + 


0 


+ 7m 


(5) 


specifies  the  configuration  of  trees  planted  after  each  harvest. 


At  this  point  we  are  ready  to  write  the  following  equation,  which  characterizes  a sustainable  harvesting  policy: 


configuration 

at  end  of 
growth  period 


— [harvest]  4s 


new  seedling 
replacement 


or  mathematically, 

This  equation  can  be  rewritten  as 


Gx-y  + £y  = x 


configuration 
at  beginning  of 
growth  period 


(I-R)y=(G-I)x 


(6) 


or  more  comprehensively  as 


'o 

-1 

-1  • • 

• -1 

-l" 

" y i 

0 

1 

0 • • 

0 

0 

72 

0 

0 

1 • • 

0 

0 

73 

0 

0 

0 • • 

1 

0 

7m-1 

0 

0 

0 • • 

0 

1 

7m 

-gi 

0 

0 • • 

0 

0 

" *1 

8 1 

-82 

0 • • 

0 

0 

*2 

0 

82 

-83  • • 

0 

0 

*3 

0 

0 

0 • • 

~8n— 1 

0 

*m— 1 

0 

0 

0 • • 

8n— 1 

0 

*M 

We  will  refer  to  Equation  6 as  the  sustainable  harvesting  condition.  Any  vectors  x and  y with  nonnegative  entries,  and  such  that 
* 1 + *2  + * " * + = s,  which  satisfy  this  matrix  equation,  determine  a sustainable  harvesting  policy  for  the  forest.  Note  that 

if  y i > 0?  then  the  harvester  is  removing  seedlings  of  no  economic  value  and  replacing  them  with  new  seedlings.  Because  there  is 
no  point  in  doing  this,  we  assume  that 


y\  = o 


(7) 


With  this  assumption,  it  can  be  verified  that  6 is  the  matrix  form  of  the  following  set  of  equations: 


72+73+  * * * +7m  = 

£1*1 

72  = 

£1*1  “£2*2 

73  = 

£2*2“  £3*3 

(8) 

7m— 1 = 

£m— 2*m—  2 _ £m— 1*m— 1 

7m  = 

£m— 1*m— 1 

Note  that  the  first  equation  in  8 is  the  sum  of  the  remaining  ^ — 

1 equations. 

Because  we  must  have  yi  > 0 for  i = 2,  3 Equations  8 require  that 

gl*l  >g2*2>  • • ' 

IV 

0Q 

IV 

o 

(9) 

Conversely,  if  x is  a column  vector  with  nonnegative  entries  that  satisfy  Equation  9,  then  7 and  8 define  a column  vector  y with 
nonnegative  entries.  Furthermore,  x and  y then  satisfy  the  sustainable  harvesting  condition  6.  In  other  words,  a necessary  and 
sufficient  condition  for  a nonnegative  column  vector  x to  determine  a forest  configuration  that  is  capable  of  sustainable 
harvesting  is  that  its  entries  satisfy  9. 


Optimal  Sustainable  Yield 

Because  we  remove  y2  trees  from  the  z'th  class  ( i = 2,  3, n)  and  each  tree  in  the  zth  class  has  an  economic  value  of  Pi , the 
total  yield  of  the  harvest,  Yld , is  given  by 


Yld  = PW2  + P2Y2  + • • • + PrVn 


(10) 


Using  8,  we  may  substitute  for  the  y2-'s  in  10  to  obtain 


Yld  = P2Six\  + (j>3  - P2)Z2X2  + — +(Pn-  Pn-l)gn-lxn-l 


(11) 


Combining  11,1,  and  9,  we  can  now  state  the  problem  of  maximizing  the  yield  of  the  forest  over  all  possible  sustainable 
harvesting  policies  as  follows: 

r n 


Problem 


Find  nonnegative  numbers  x\,  X2,  xn  that  maximize 

Yld  = P2g\*\  + (P2  ~ P2)S2X2  + — + “ Pn- l)gw-l*M-l 


subject  to 
and 


*1  +*2  + — + = s 


gl*l  >g2*2>  — >g«-l*w-l  >0 


L J 

As  formulated  above,  this  problem  belongs  to  the  field  of  linear  programming.  However,  we  will  illustrate  the  following  result, 
without  linear  programming  theory,  by  actually  exhibiting  a sustainable  harvesting  policy. 


Optimal  Sustainable  Yield 

The  optimal  sustainable  yield  is  achieved  by  harvesting  all  the  trees  from  one  particular  height  class  and  none  of  the  trees 
from  any  other  height  class. 


Let  us  first  set 

Yldk  = yield  obtained  by  harvesting  all  of  the  Ath  class  and  none  of  the  other  classes 

The  largest  value  of  Yld ^ for  k = 2,  3, . n will  then  be  the  optimal  sustainable  yield,  and  the  corresponding  value  of  k will  be 
the  class  that  should  be  completely  harvested  to  attain  the  optimal  sustainable  yield.  Because  no  class  but  the  Mi  is  harvested,  we 
have 


y2=73=— =yk-i  =yk+i=-=yn  = o (i2> 

In  addition,  because  all  of  the  Mi  class  is  harvested,  no  trees  are  ever  present  in  the  height  classes  above  the  Mi  class.  Thus, 

*k  = xk+ 1=  — = x„  = 0 (13) 

Substituting  12  and  13  into  the  sustainable  harvesting  condition  8 gives 

Yk  = 

0 = 

0 = 

0 = 
yk  = 


g 1*1 

g\x\-g2*2 


2*fc— 2 “ gk—\xk—\ 
gk—\xk—\ 


(14) 


Equations  14  can  also  be  written  as 


yk  = 21*1  = g2*2  = ...  = gk- l*k-l 


(15) 


from  which  it  follows  that 


*2  = £1*1  I £2 

*3  = £1*1  ^£3 


**- 1 = £1*  1 / £fc-l 


If  we  substitute  Equations  13  and  16  into 


*1  +*2  + ...+•*«  =s 


[which  is  Equation  1],  we  can  solve  for  x \ and  obtain 

*1  = 


i + — + — 

£2  £3  £fc-l 


For  the  yield  Yldfr,  we  combine  10,  12,  15,  and  17  to  obtain 

YMk  = P2y2  + Piy2  + — + Pnyn 

= PkPk 


= Pkg  1*1 


£1 


PkS 


1 

£fe-l 


(16) 


(17) 


(18) 


Equation  18  determines  Yld ^ in  terms  of  the  known  growth  and  economic  parameters  for  any  k = 2,  3, n.  Thus,  the  optimal 
sustainable  yield  is  found  as  follows. 


Finding  the  Optimal  Sustainable  Yield 

The  optimal  sustainable  yield  is  the  largest  value  of 

PM_ 

— + — H — 

£l  £2  £fc-l 

for  k = 2,  3, . n.  The  corresponding  value  of  k is  the  number  of  the  class  that  is  completely  harvested. 


In  Exercise  4 we  ask  you  to  show  that  the  nonharvest  vector  x for  the  optimal  sustainable  yield  is 


1 /£1 
1 /£2 


x = 


£1 


s 


£2 


=t= ...  + 


1 

£fc-l 


/ £fc-l 
0 
0 


0 


(19) 


Theorem  10.9.2  implies  that  it  is  not  necessarily  the  highest-priced  class  of  trees  that  should  be  totally  cropped.  The  growth 
parameters  gj  must  also  be  taken  into  account  to  determine  the  optimal  sustainable  yield. 


EXAMPLE  1 Using  Theorem  10.9.2 


For  a Scots  pine  forest  in  Scotland  with  a growth  period  of  six  years,  the  following  growth  matrix  was  found  (see 
M.  B.  Usher,  “A  Matrix  Approach  to  the  Management  of  Renewable  Resources,  with  Special  Reference  to  Selection 
Forests  ” Journal  of  Applied  Ecology,  vol.  3,  1966,  pp.  355-367): 

’.72  0 0 0 0 0 ’ 

.28  .69  0 0 0 0 

G=  0 .31  .75  0 0 0 

0 0 .25  .77  0 0 

0 0 0 .23  .63  0 

0 0 0 0 .37  1.00 

Suppose  that  the  prices  of  trees  in  the  five  tallest  height  classes  are 

£>2  = $50,  />3  = $100,  />4=$150,  p$  = $200,  p$  = $250 


Which  class  should  be  completely  harvested  to  obtain  the  optimal  sustainable  yield,  and  what  is  that  yield? 


From  matrix  G we  have  that 


gl  = .28, 

g 2=  31, 

g 3 

= . 

25, 

g 4 

= 23. 

Equation  1 8 then  gives 

Yld  2 

= 50 s/(. 

28" 

‘)  = 

14.0s 

Yld  3 

= 100 s/( 

. 28 

"l  + 

. 31 _1 

) = 

: 14.7s 

yid4 

= 150s/ ( 

. 28 

-u 

. 31 _1 

+ 

. 25-1)  = 

13.9s 

Yld  $ 

= 200s/ ( 

. 28 

-l+ 

. 31 -1 

+ 

. 25 -1  + 

. 23 -1)  = 

13.2s 

Yld  ^ 

= 250s/ ( 

. 28 

-,+ 

. 31 _1 

+ 

. 25_1  + 

. 23 -1  + 

. 37_1)  = 14.0s 

S5  = - 37 


We  see  that  Yld  2 is  the  largest  of  these  five  quantities,  so  from  Theorem  10.9.2  the  third  class  should  be  completely 
harvested  every  six  years  to  maximize  the  sustainable  yield.  The  corresponding  optimal  sustainable  yield  is  $ 14.7s, 
where  s is  the  total  number  of  trees  in  the  forest. 


Exercise  Set  10.9 

1.  A certain  forest  is  divided  into  three  height  classes  and  has  a growth  matrix  between  harvests  given  by 

0 0 


If  the  price  of  trees  in  the  second  class  is  $30  and  the  price  of  trees  in  the  third  class  is  $50,  which  class  should  be  completely 
harvested  to  attain  the  optimal  sustainable  yield?  What  is  the  optimal  yield  if  there  are  1000  trees  in  the  forest? 

Answer: 

The  second  class;  $15,000 

2.  In  Example  1 , to  what  level  must  the  price  of  trees  in  the  fifth  class  rise  so  that  the  fifth  class  is  the  one  to  harvest  completely 
in  order  to  attain  the  optimal  sustainable  yield? 

Answer: 

$223 

3.  In  Example  1,  what  must  the  ratio  of  the  prices  P2  P3  PA  P5  P6  in  order  that  the  yields  Yld^,  k=  2,  3,  4,  5,  6,  all  be  the 


same?  (In  this  case,  any  sustainable  harvesting  policy  will  produce  the  same  optimal  sustainable  yield. 

Answer: 

1:1.90:3.02:4.24:5.00 

4.  Derive  Equation  19  for  the  nonharvest  vector  x corresponding  to  the  optimal  sustainable  harvesting  policy  described  in 
Theorem  10.9.2. 

5.  For  the  optimal  sustainable  harvesting  policy  described  in  Theorem  10.9.2,  how  many  trees  are  removed  from  the  forest 
during  each  harvest? 

Answer: 

s!  (gf1  +g2_1  + • • * +gft?i) 

6.  If  all  the  growth  parameters  g\,  g2, ...,  gM-i  in  the  growth  matrix  G are  equal,  what  should  the  ratio  of  the  prices 

P2-P3  ---  Pn  be  in  order  that  any  sustainable  harvesting  policy  be  an  optimal  sustainable  harvesting  policy?  (See  Exercise  3.) 

Answer: 

1:2:3:  ■■■:»-! 

Technology  Exercises 


The  following  exercises  are  designed  to  be  solved  using  a technology  utility.  Typically,  this  will  be  MATLAB,  Mathematical 
Maple,  Derive,  or  Mathcad,  but  it  may  also  be  some  other  type  of  linear  algebra  software  or  a scientific  calculator  with  some 
linear  algebra  capabilities.  For  each  exercise  you  will  need  to  read  the  relevant  documentation  for  the  particular  utility  you  are 
using.  The  goal  of  these  exercises  is  to  provide  you  with  a basic  proficiency  with  your  technology  utility.  Once  you  have 
mastered  the  techniques  in  these  exercises,  you  will  be  able  to  use  your  technology  utility  to  solve  many  of  the  problems  in  the 
regular  exercise  sets. 


Tl.  A particular  forest  has  growth  parameters  given  by 


s’  = 7 


for  i = 1,  2,  3, n — 1,  where  n (the  total  number  of  height  classes)  can  be  chosen  as  large  as  needed.  Suppose  that  the  value  of 
a tree  in  the  Ath  height  interval  is  given  by 

pk=a{k-\y 

where  a is  a constant  (in  dollars)  and  p is  a parameter  satisfying  1 < p < 2. 

(a)  Show  that  the  yield  Yld^  is  given  by 


(b)  For 

p=  1.0,  1.1,  1.2,  1.3,  1.4,  1.5,  1.6,  1.7,  1.8,  1.9 

use  a computer  to  determine  the  class  number  that  should  be  completely  harvested,  and  determine  the  optimal  sustainable 
yield  in  each  case.  Make  sure  that  you  allow  k to  take  on  only  integer  values  in  your  calculations. 

(c)  Repeat  the  calculations  in  part  (b)  using 

p=  1.91,  1.92,  1.93,  1.94,  1.95, 

1.96,  1.97,  1.98,  1.99 


(d)  Show  that  if  p = 2,  then  the  optimal  sustainable  yield  can  never  be  larger  than  2 as. 

(e)  Compare  the  values  of  k determined  in  parts  (b)  and  (c)  to  1 / (2  — /?) , and  use  some  calculus  to  explain  why 


k ~ 


1 

2 -P 


T2.  A particular  forest  has  growth  parameters  given  by 


2J 


for  i = 1,  2,  3, 1,  where  n (the  total  number  of  height  classes)  can  be  chosen  as  large  as  needed.  Suppose  that  the  value  of 
a tree  in  the  kt h height  interval  is  given  by 


pk=a(k- l)p 

where  a is  a constant  (in  dollars)  and  p is  a parameter  satisfying  1 < p. 

(a)  Show  that  the  yield  Yld £ is  given  by 


Yldk 


a(k~  1 )ps 

2k-2 


(b)  For 


P=  1,2,  3,4,5,  6,7,  8,  9,  10 


use  a computer  to  determine  the  class  number  that  should  be  completely  harvested  in  order  to  obtain  an  optimal  yield,  and 
determine  the  optimal  sustainable  yield  in  each  case.  Make  sure  that  you  allow  k to  take  on  only  integer  values  in  your 
calculations. 


(c)  Compare  the  values  of  k determined  in  part  (b)  to  1 + p / ln(2)  and  use  some  calculus  to  explain  why 


P 

ln(2) 
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10.10  Computer  Graphics 

In  this  section  we  assume  that  a view  of  a three-dimensional  object  is  displayed  on  a video  screen  and  show 
how  matrix  algebra  can  be  used  to  obtain  new  views  of  the  object  by  rotation,  translation,  and  scaling. 


Prerequisites 

Matrix  Algebra 
Analytic  Geometry 


Visualization  of  a Three-Dimensional  Object 

Suppose  that  we  want  to  visualize  a three-dimensional  object  by  displaying  various  views  of  it  on  a video 
screen.  The  object  we  have  in  mind  to  display  is  to  be  determined  by  a finite  number  of  straight  line  segments. 
As  an  example,  consider  the  truncated  right  pyramid  with  hexagonal  base  illustrated  in  Figure  10.10.1.  We  first 
introduce  an  xyz-coordinate  system  in  which  to  embed  the  object.  As  in  Figure  10.10.1,  we  orient  the  coordinate 
system  so  that  its  origin  is  at  the  center  of  the  video  screen  and  the  xy-plane  coincides  with  the  plane  of  the 
screen.  Consequently,  an  observer  will  see  only  the  projection  of  the  view  of  the  three-dimensional  object  onto 
the  two-dimensional  xy-plane. 


A >’ 


Figure  10.10.1 

In  the  xyz-coordinate  system,  the  endpoints  P\,  Pyi  °f  the  straight  line  segments  that  determine  the  view 

of  the  object  will  have  certain  coordinates — say, 

(x2,y2,Z2) 

These  coordinates,  together  with  a specification  of  which  pairs  are  to  be  connected  by  straight  line  segments, 


are  to  be  stored  in  the  memory  of  the  video  display  system.  For  example,  assume  that  the  12  vertices  of  the 
truncated  pyramid  in  Figure  10. 10. 1 have  the  following  coordinates  (the  screen  is  4 units  wide  by  3 units  high): 


P\\  (1.000,  - .800,  .000), 

■P3:  ( — .500,  - .800,  - .866), 
P5:(-.500,  - .800,  .866), 
P-j:  (.840,  - .400,  .000), 

Pg:  ( — .210,  .650,  - .364), 
P\\:(  — .210,  .650,  .364), 


P2:(. 500,  — .800,  - .866), 
P4:(-  1.000,  - .800,  .000), 
P6:(. 500,  - .800,  .866), 

P%\  (.315,  .125,  -.546), 

P 10;  ( — .360,  .800,  .000), 
P\2'.  ( 315,  .125,  .546) 


These  12  vertices  are  connected  pairwise  by  18  straight  line  segments  as  follows,  where  Pi  <-►  Pj  denotes  that 
point  Pi  is  connected  to  point  Py. 

P\++P2,  P2  ++P2>  Pz<-*P4>  P4++  P5>  P5++P6,  P6~PU 

P-J++P2,  Pz^+Py,  P9<=>P\q,  P\Q++Pn>  ^11^^12,  P 12 

P\++Pj,  P2*-+  P%,  P2  «->  P9>  P 4 ++  P \Qy  P5++P 11,  P6 

In  View  1 these  1 8 straight  line  segments  are  shown  as  they  would  appear  on  the  video  screen.  It  should  be 
noticed  that  only  the  x-  and  y-coordinates  of  the  vertices  are  needed  by  the  video  display  system  to  draw  the 
view,  because  only  the  projection  of  the  object  onto  the  xy-plane  is  displayed.  However,  we  must  keep  track  of 
the  z-coordinates  to  carry  out  certain  transformations  discussed  later. 


View  1 


We  now  show  how  to  form  new  views  of  the  object  by  scaling,  translating,  or  rotating  the  initial  view.  We  first 
construct  a 3 x n matrix  P , referred  to  as  the  coordinate  matrix  of  the  view , whose  columns  are  the  coordinates 
of  the  n points  of  a view: 


P = 


X\  X2  — Xfl 

y 1 72  — 7m 

^1  z2  ---  ZYl 


For  example,  the  coordinate  matrix  P corresponding  to  View  1 is  the  3 x 12  matrix 


1.000 

.500 

-.500 

— 1.000 

-.500 

.500 

.840 

.315 

-.210 

-.360 

-.210 

.315 

-.800 

-.800 

-.800 

-.800 

-.800 

-.800 

-.400 

.125 

.650 

.800 

.650 

.125 

.000 

-.866 

-.866 

.000 

.866 

.866 

.000 

-.546 

-.364 

.000 

.364 

.546 

We  will  show  below  how  to  transform  the  coordinate  matrix  P of  a view  to  a new  coordinate  matrix  P? 
corresponding  to  a new  view  of  the  object.  The  straight  line  segments  connecting  the  various  points  move  with 
the  points  as  they  are  transformed.  In  this  way,  each  view  is  uniquely  determined  by  its  coordinate  matrix  once 
we  have  specified  which  pairs  of  points  in  the  original  view  are  to  be  connected  by  straight  lines. 


Scaling 


The  first  type  of  transformation  we  consider  consists  of  scaling  a view  along  the  x,  y,  and  z directions  by  factors 
of  a,  P,  and  y,  respectively.  By  this  we  mean  that  if  a point  P2  has  coordinates  (* . ? yi  7 z{)  in  the  original  view,  it 
is  to  move  to  a new  point  P'  with  coordinates  (oxj  , , 'yZj)  in  the  new  view.  This  has  the  effect  of 

transforming  a unit  cube  in  the  original  view  to  a rectangular  parallelepiped  of  dimensions  a x ft  x 7 (Figure 
10.10.2).  Mathematically,  this  may  be  accomplished  with  matrix  multiplication  as  follows.  Define  a 3 x 3 
diagonal  matrix 


S = 


a 0 
0 0 
0 0 


0 

0 

7 


Then,  if  a point  P2  in  the  original  view  is  represented  by  the  column  vector 

yi 

then  the  transformed  point  P'  is  represented  by  the  column  vector 


r /i 

a 0 0 

~*i~ 

* 

= 

0 p 0 

yi 

1 

O 

O 

l 

2 i 

Using  the  coordinate  matrix  P,  which  contains  the  coordinates  of  all  n points  of  the  original  view  as  its  columns, 
we  can  transform  these  n points  simultaneously  to  produce  the  coordinate  matrix  Pf  of  the  scaled  view,  as 
follows: 


a 0 

0 

'^1 

*2  - 

- *n 

0 0 

0 

y 1 

yi  - 

- yn 

0 0 

7 

21 

22  - 

• zn 

0*1 

o*2  — 

Q*„‘ 

Py  1 

&y2  — 

= P' 

7^1 

722 

IZn 

The  new  coordinate  matrix  can  then  be  entered  into  the  video  display  system  to  produce  the  new  view  of  the 
object.  As  an  example,  View  2 is  View  1 scaled  by  setting  & = jg,  .1  = 0.5,  and  * = 3.0.  Note  that  the  scaling 
y = 3.0  along  the  z-axis  is  not  visible  in  View  2,  since  we  see  only  the  projection  of  the  object  onto  the 
xy-plane. 


Figure  10.10.2 


View  1 scaled  by  0;=  1 g,  0.5,  7=  3.0 


Translation 

We  next  consider  the  transformation  of  translating  or  displacing  an  object  to  a new  position  on  the  screen. 
Referring  to  Figure  10.10.3,  suppose  we  desire  to  change  an  existing  view  so  that  each  point  P2  with 
coordinates  z{)  moves  to  a new  point  P!-  with  coordinates  (x^  + xq,  yi  + 70,  zi  +zo)*  The  vector 

"*0_ 

70 

z0 

is  called  the  translation  vector  of  the  transformation.  By  defining  a 3 x n matrix  T as 


T= 


*0  *0  *0 

yo  yo  — 7o 

ZQ  ZQ  ...  ZQ 

we  can  translate  all  n points  of  the  view  determined  by  the  coordinate  matrix  P by  matrix  addition  via  the 
equation 

P'  = P + 7 

The  coordinate  matrix  Pf  then  specifies  the  new  coordinates  of  the  n points.  For  example,  if  we  wish  to 
translate  View  1 according  to  the  translation  vector 

"1.2" 

0.4 

1.7 

the  result  is  View  3.  Note,  again,  that  the  translation  zq  = 1.7  along  the  z-axis  does  not  show  up  explicitly  in 
View  3. 


View  1 translated  by  = 1.2,  y q = 0.4,  zq  = 1.7  . 


In  Exercise  7,  a technique  of  performing  translations  by  matrix  multiplication  rather  than  by  matrix  addition  is 
explained. 


Rotation 

A more  complicated  type  of  transformation  is  a rotation  of  a view  about  one  of  the  three  coordinate  axes.  We 
begin  with  a rotation  about  the  z-axis  (the  axis  perpendicular  to  the  screen)  through  an  angle  0.  Given  a point  P2 
in  the  original  view  with  coordinates  yir  z2),  we  wish  to  compute  the  new  coordinates  ( x yf,  zf)  of  the 


rotated  point  P'.  Referring  to  Figure  10.10.4  and  using  a little  trigonometry,  you  should  be  able  to  derive  the 
following: 

xj  = p cos($  + 0)  =p  cos  $ cos  0 — p sin (&  sin 0 = x 2 cos  0 — y*  sm0 
y[  = p sin($  + 6)  = p cos  <&  sin  9 4-  p sin  cos  0 = x 2*  sin  9 + y 2 cos  0 


These  equations  can  be  written  in  matrix  form  as 


cos  0 

—sin# 

0‘ 

~*i~ 

= 

sin# 

cos  # 

0 

yi 

0 

0 

1 

Zi 

If  we  let  R denote  the  3 x 3 matrix  in  this  equation,  all  n points  can  be  rotated  by  the  matrix  product  Pr  = RP  to 
yield  the  coordinate  matrix  Pf  of  the  rotated  view. 


Figure  10.10.4 

Rotations  about  the  x-  and  y-axes  can  be  accomplished  analogously,  and  the  resulting  rotation  matrices  are 
given  with  Views  4,  5,  and  6.  These  three  new  views  of  the  truncated  pyramid  correspond  to  rotations  of  View  1 
about  the  x-,  y-9  and  z-axes,  respectively,  each  through  an  angle  of  90°. 

Rotation  about  the  r-axis 

1 0 0 
0 cos 6 -sin# 

0 sin  0 cos  0 


View  1 rotated  90°  about  the  x-axis 


Rotation  about  the  v-axis 

4? 


cos  0 0 sin  0 

0 1 0 

-sin  0 0 cos  0 


View  1 rotated  90°  about  they-axis. 


Rotation  about  the  c-axis 

4* 


ccxs  0 -sin  0 0 


View  1 rotated  90°  about  the  z-axis. 


Rotations  about  three  coordinate  axes  may  be  combined  to  give  oblique  views  of  an  object.  For  example,  View 
7 is  View  1 rotated  first  about  the  x-axis  through  30°,  then  about  they-axis  through  _7Q°,  and  finally  about  the 
z-axis  through  — 27°-  Mathematically,  these  three  successive  rotations  can  be  embodied  in  the  single 
transformation  equation  Pf  = RP,  where  R is  the  product  of  three  individual  rotation  matrices: 


in  the  order 


*1  = 

'l  0 0 

0 cos(30°)  — sin(30°) 

0 sin(30°)  cos(30°) 

*2  = 

cos(  — 70°)  0 sin(  — 70°) 

0 1 0 

— sin(  — 70°)  0 cos(  — 70°) 

R3  = 

"cos(  — 27°)  — sin(  — 27°)  O' 
sin(  — 27°)  cos(  — 27°)  0 

0 0 1 

.305  -.025  - 
R = R3R2Ri=  -.155  .985  -. 

.940  .171 

952' 

076 

296 

Oblique  view  of  truncated  pyramid. 


As  a final  illustration,  in  View  8 we  have  two  separate  views  of  the  truncated  pyramid,  which  constitute  a 
stereoscopic  pair.  They  were  produced  by  first  rotating  View  7 about  the  y-axis  through  an  angle  of  _3°  and 
translating  it  to  the  right,  then  rotating  the  same  View  7 about  the  y-axis  through  an  angle  of  | 3°  and 
translating  it  to  the  left.  The  translation  distances  were  chosen  so  that  the  stereoscopic  views  are  about  27- 

inches  apart — the  approximate  distance  between  a pair  of  eyes. 


Stereoscopic  figure  of  truncated  pyramid.  The  three-dimensionality  of  the  diagram  can  be  seen 
by  holding  the  book  about  one  foot  away  and  focusing  on  a distant  object.  Then  by  shifting  your 
gaze  to  View  8 without  refocusing,  you  can  make  the  two  views  of  the  stereoscopic  pair  merge 
together  and  produce  the  desired  effect. 


Exercise  Set  10.10 


1.  View  9 is  a view  of  a square  with  vertices  (0,  0,  0),  (1,  0,  0),  (1,  1,0),  and  (0,  1,0). 

(a)  What  is  the  coordinate  matrix  of  View  9? 

(b)  What  is  the  coordinate  matrix  of  View  9 after  it  is  scaled  by  a factor  I7-  in  the  v-direction  and  -i  in  the 
y-direction?  Draw  a sketch  of  the  scaled  view. 

(c)  What  is  the  coordinate  matrix  of  View  9 after  it  is  translated  by  the  following  vector? 


-2 

-1 

3 


Draw  a sketch  of  the  translated  view. 


(d)  What  is  the  coordinate  matrix  of  View  9 after  it  is  rotated  through  an  angle  of  —30°  about  the  z-axis? 
Draw  a sketch  of  the  rotated  view. 


Square  with  vertices  (0,  0,  0),  (1,  0,  0),  (1,  1,0),  and  (0,  1,  0)  (Exercises  1 and  2) 


Answer: 


(a) 


0 110 
0 0 11 
0 0 0 0 


(b) 


0 

0 


3 3 

2 2 


0 0 


0 

1 

2 

0 


0 .866  1.366  .500 

0 -.500  .366  .866 

0 0 0 0 


• (a)  If  the  coordinate  matrix  of  View  9 is  multiplied  by  the  matrix 


0 1 0 
0 0 1 


the  result  is  the  coordinate  matrix  of  View  10.  Such  a transformation  is  called  a shear  in  the  x-direction 
with  factor  with  respect  to  the  y-coordinate.  Show  that  under  such  a transformation,  a point  with 

coordinates  (xJ;  y}>  z,)  has  new  coordinates  (x,-  4-  -iy,,  y;-,  z,) . 


(b)  What  are  the  coordinates  of  the  four  vertices  of  the  shear  square  in  View  10? 


(c)  The  matrix 


1 0 0 
.6  1 0 
0 0 1 

determines  a shear  in  the  y-direction  with  factor  .6  with  respect  to  the  x-coordinate  (an  example  appears 
in  View  11).  Sketch  a view  of  the  square  in  View  9 after  such  a shearing  transformation,  and  find  the 
new  coordinates  of  its  four  vertices. 


View  9 sheared  along  the  x-axis  by  — with  respect  to  the  y-coordinate  (Exercise  2) 


View  1 sheared  along  they-axis  by  .6  with  respect  to  the  x-coordinate  (Exercise  2). 

Answer: 

(b) 

(0,0,0),  (1,0,0),  (l^l.o),  and^.l.o) 

(c)  (0,0,0),  (1,  .6,  0),  (1,16,0),  (0,1,0) 

• (a)  The  reflection  about  the  xz-plane  is  defined  as  the  transformation  that  takes  a point  (x2,  y3 , z{)  to  the 
point  (Xj,  — yir  Zj)  (e-g-?  View  12).  If  P and  Pf  are  the  coordinate  matrices  of  a view  and  its  reflection 
about  the  xz-plane,  respectively,  find  a matrix  M such  that  P,r  = MP- 

(b)  Analogous  to  part  (a),  define  the  reflection  about  theyz-plane  and  construct  the  corresponding 
transformation  matrix.  Draw  a sketch  of  View  1 reflected  about  theyz-plane. 

(c)  Analogous  to  part  (a),  define  the  reflection  about  the  xy-plane  and  construct  the  corresponding 
transformation  matrix.  Draw  a sketch  of  View  1 reflected  about  the  xy-plane. 


View  1 reflected  about  the  xz-plane  (Exercise  3). 


Answer: 


(a) 


1 0 0 

0-10 

0 0 1 


0 1 
0 0 


0 

0 

1 


(c) 


1 0 0 

0 1 0 

0 0-1 


(a)  View  13  is  View  1 subject  to  the  following  five  transformations: 


1 • Scale  by  a factor  of  in  the  x-direction,  2 in  the  r-direction,  and  4 in  the  z-direction. 

2-  Translate  4 unit  in  the  x-direction. 

2 

3.  Rotate  20°  about  the  x-axis. 

4.  Rotate —45°  about  the  y-axis. 

5.  Rotate  90°  about  the  z-axis. 

Construct  the  five  matrices  M\,M 2,  M3,  M 4,  and  M 5 associated  with  these  five  transformations. 

(b)  If  P is  the  coordinate  matrix  of  View  1 and  P‘  is  the  coordinate  matrix  of  View  13,  express  Pf  in  terms 
of  Mi,  M2,  M3,  M4,  M5,  andR. 


View  1 scaled,  translated,  and  rotated  (Exercise  4) 


Answer: 


(a) 


M i = 


\ ° 
0 2 

0 0 


' 1 

1 

1 ' 

O 

O 

2 

2 

2 

0 0 

, M2  = 

0 

0 • • • 

0 

, M3  = 

0 cos  20  —sin  20 

0 

0 • • • 

0 

0 sin  20  cos  20 

_ 

cos ( — 45  ) 

0 

1 

O 

m 

1 

s 

CO 

'0 

-1 

o' 

m4  = 

0 

1 

0 

, m5  = 

1 

0 

0 

—sin  ( — 45  ) 

0 

cos ( — 45  ) 

0 

0 

1 

(b)  Pf  = + M2) 


• (a)  View  14  is  View  1 subject  to  the  following  seven  transformations: 


1 . Scale  by  a factor  of  .3  in  the  x-direction  and  by  a factor  of  .5  in  the  y-direction. 

2.  Rotate  45°  about  the  x-axis. 

3.  Translate  1 unit  in  the  x-direction. 

4.  Rotate  35°  about  they-axis. 

5.  Rotate  -45°  about  the  z-axis. 

6.  Translate  1 unit  in  the  z-direction. 

7.  Scale  by  a factor  of  2 in  the  x-direction. 

Construct  the  matrices  M\,  M2,  Mj  associated  with  these  seven  transformations. 

(b)  If  P is  the  coordinate  matrix  of  View  1 and  P‘  is  the  coordinate  matrix  of  View  14,  express  Pr  in  terms 
of  M\,  M2, Mj9  and  P. 


View  1 scaled,  translated,  and  rotated  (Exercise  5). 


Answer: 


(a) 

".3 

O 

O 

1 

0 

0 

'1  1 • • • 

f 

Mi  = 

0 

.5  0 

* 

to 

II 

0 

o 

cos  45 

—sin  45 

. m3  = 

o 

o 

0 

0 

0 1 

0 

sin  45 

cos  45 

o 

o 

0 

cos 

o 

35 

0 sin  35 

o 

o 

cos  ( — 45  ) —sin 

1 

o 

o 

i 

Ma  = 


0 

—sin  35 


1 

0 


0 

cos  35 


M5  = 


J 

_ 

'o 

o 

o 

'2  0 O' 

m6  = 

0 0 • • • 0 

, m7  = 

0 1 0 

11  • • • 1 

0 0 1 

sin  (-45  ) 
0 


cos ( — 45  ) 
0 


(b)  P*  = \P  + M3)  + M$) 


6.  Suppose  that  a view  with  coordinate  matrix  P is  to  be  rotated  through  an  angle  0 about  an  axis  through  the 
origin  and  specified  by  two  angles  a and  p (see  Figure  Ex-6).  If  P!  is  the  coordinate  matrix  of  the  rotated 
view,  find  rotation  matrices  /?i,  R2,  £3,  R4,  and  R$  such  that 

P'  = R5R  4R3R2R\P 

[Hint:  The  desired  rotation  can  be  accomplished  in  the  following  five  steps: 

1 . Rotate  through  an  angle  of  P about  the  jy-axis. 

2.  Rotate  through  an  angle  of  a about  the  z-axis. 

3.  Rotate  through  an  angle  of  0 about  the  jy-axis. 

4.  Rotate  through  an  angle  of -a  about  the  z-axis. 

5.  Rotate  through  an  angle  of  -p  about  the  jy-axis.] 


Answer: 


cos 

0 

sin/? 

cos  a 

—sin  a 

0 

R{  = 

0 

1 

0 

. *2  = 

sin  a 

cos  a 

0 

— sin 

0 

cos  $ 

0 

0 

1 

cos  6 

0 

sin# 

cos  a 

sin  a; 

o' 

r3  = 

0 

1 

0 

i?4  = 

—sin  a 

cos  a 

0 

—sin  9 

0 

cos  9 

0 

0 

1 

cos  0 

0 

— sin/? 

r5  = 

0 

1 

0 

sin  ,3 

0 

cos  /? 

7.  This  exercise  illustrates  a technique  for  translating  a point  with  coordinates  (x},  yit  Zj ) to  a point  with 
coordinates  (x,  | xg,  | y^,Zj  I zg  ) by  matrix  multiplication  rather  than  matrix  addition. 

(a)  Let  the  point  (*,,  yu  z}  ) be  associated  with  the  column  vector 

y i 


V,  = 


Zi 


1 


and  let  the  point  (x,  | xg,  _y,  4 _yg;  z,  | zg)  be  associated  with  the  column  vector 


Xj  + xg 

yi+y  o 

Zj  + ZQ 


l 


Find  a 4 x 4 matrix  M such  that  vj  = M\j. 

(b)  Find  the  specific  4x4  matrix  of  the  above  form  that  will  effect  the  translation  of  the  point  (4,  — 2,  3) 
to  the  point  ( — 1,  7,  0). 


Answer: 


(a) 


M = 


1 0 0 xg 
0 1 0 yo 
0 0 1 zg 
0 0 0 1 


(b) 


10  0-5 
0 10  9 
0 0 1-3 
0 0 0 1 


8.  For  the  three  rotation  matrices  given  with  Views  4,  5,  and  6,  show  that 


(A  matrix  with  this  property  is  called  an  orthogonal  matrix.  See  Section  7.1.) 


Technology  Exercises 


The  following  exercises  are  designed  to  be  solved  using  a technology  utility.  Typically,  this  will  be  MATLAB, 
Mathematica,  Maple,  Derive,  or  Mathcad,  but  it  may  also  be  some  other  type  of  linear  algebra  software  or  a 
scientific  calculator  with  some  linear  algebra  capabilities.  For  each  exercise  you  will  need  to  read  the  relevant 
documentation  for  the  particular  utility  you  are  using.  The  goal  of  these  exercises  is  to  provide  you  with  a 
basic  proficiency  with  your  technology  utility.  Once  you  have  mastered  the  techniques  in  these  exercises,  you 
will  be  able  to  use  your  technology  utility  to  solve  many  of  the  problems  in  the  regular  exercise  sets. 


Tl.  Let  ( a , b7  c ) be  a unit  vector  normal  to  the  plane  ax  + by  + cz  = 0?  and  let  r = (x,  y7  z)  be  a vector.  It 
can  be  shown  that  the  mirror  image  of  the  vector  r through  the  above  plane  has  coordinates 
l'm  = ( xm>  ym>  zm)>  wbere 


xm 

"x " 

ym 

= M 

y 

zm 

z 

with 


M = /-2nnr  = 


'1 

0 

o' 

~a~ 

0 

l 

0 

-2 

b 

0 

0 

1 

c 

[a  b c] 


(a)  Show  that  = I and  give  a physical  reason  why  this  must  be  so.  [Hint:  Use  the  fact  that  {a,  b7  c ) is  a 
unit  vector  to  show  that  n J n = ] .] 


(b)  Use  a computer  to  show  that  det(  M ) = — 1 . 

(c)  The  eigenvectors  of  M satisfy  the  equation 


ym 

= M 

1 

= A 

1 

* 

1 

zm 

z 

z 

and  therefore  correspond  to  those  vectors  whose  direction  is  not  affected  by  a reflection  through  the  plane. 
Use  a computer  to  determine  the  eigenvectors  and  eigenvalues  of  M,  and  then  give  a physical  argument  to 
support  your  answer. 


T2.  A vector  v = (x,  y7  z)  is  rotated  by  an  angle  0 about  an  axis  having  unit  vector  ( a7  b7  c ) , thereby  forming 
the  rotated  vector  = (xr7  yj^7  zr)  • It  can  be  shown  that 


~XR~ 

"x " 

yR 

ZR 

=m 

y 

z 

with 


1 0 0 

~a~ 

R(0)  = cos(0) 

0 1 0 

+ (l-cos(0)) 

b 

[a  b 

0 0 1 

c 

0 ■ 

-c 

b 

4-  sin(0) 

c 

0 

—a 

-b 

a 

0 

c] 


(a)  Use  a computer  to  show  that  R (0)R(ip)  = R(0  + ip),  and  then  give  a physical  reason  why  this  must  be  so. 
Depending  on  the  sophistication  of  the  computer  you  are  using,  you  may  have  to  experiment  using  different 
values  of  a , b , and 

c — 

(b)  Show  also  that  R (0)  = R(  — 9)  and  give  a physical  reason  why  this  must  be  so. 

(c)  Use  a computer  to  show  that  det(/?(0))  = + 1 . 
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10.11  Equilibrium  Temperature  Distributions 

In  this  section  we  will  see  that  the  equilibrium  temperature  distribution  within  a trapezoidal  plate  can  be  found 
when  the  temperatures  around  the  edges  of  the  plate  are  specified.  The  problem  is  reduced  to  solving  a system  of 
linear  equations.  Also,  an  iterative  technique  for  solving  the  problem  and  a “random  walk”  approach  to  the 
problem  are  described. 


Prerequisites 

Linear  Systems 
Matrices 

Intuitive  Understanding  of  Limits 


Boundary  Data 


Suppose  that  the  two  faces  of  the  thin  trapezoidal  plate  shown  in  Figure  10.11.1a  are  insulated  from  heat.  Suppose 
that  we  are  also  given  the  temperature  along  the  four  edges  of  the  plate.  For  example,  let  the  temperature  be 
constant  on  each  edge  with  values  of  0°,  Q°,  1°,  and  2°,  as  in  the  figure.  After  a period  of  time,  the  temperature 
inside  the  plate  will  stabilize.  Our  objective  in  this  section  is  to  determine  this  equilibrium  temperature  distribution 
at  the  points  inside  the  plate.  As  we  will  see,  the  interior  equilibrium  temperature  is  completely  determined  by  the 
boundary  data — that  is,  the  temperature  along  the  edges  of  the  plate. 


Figure  10.11.1 

The  equilibrium  temperature  distribution  can  be  visualized  by  the  use  of  curves  that  connect  points  of  equal 
temperature.  Such  curves  are  called  isotherms  of  the  temperature  distribution.  In  Figure  10.1 1.1Z?  we  have 
sketched  a few  isotherms,  using  information  we  derive  later  in  the  chapter. 


Although  all  our  calculations  will  be  for  the  trapezoidal  plate  illustrated,  our  techniques  generalize  easily  to  a plate 
of  any  practical  shape.  They  also  generalize  to  the  problem  of  finding  the  temperature  within  a three-dimensional 
body.  In  fact,  our  “plate”  could  be  the  cross  section  of  some  solid  object  if  the  flow  of  heat  perpendicular  to  the 
cross  section  is  negligible.  For  example,  Figure  10.11.1  could  represent  the  cross  section  of  a long  dam.  The  dam  is 
exposed  to  three  different  temperatures:  the  temperature  of  the  ground  at  its  base,  the  temperature  of  the  water  on 
one  side,  and  the  temperature  of  the  air  on  the  other  side.  A knowledge  of  the  temperature  distribution  inside  the 
dam  is  necessary  to  determine  the  thermal  stresses  to  which  it  is  subjected. 

Next  we  will  consider  a certain  thermodynamic  principle  that  characterizes  the  temperature  distribution  we  are 
seeking. 


The  Mean-Value  Property 

There  are  many  different  ways  to  obtain  a mathematical  model  for  our  problem.  The  approach  we  use  is  based  on 
the  following  property  of  equilibrium  temperature  distributions. 


The  Mean-Value  Property 

Let  a plate  be  in  thermal  equilibrium  and  let  P be  a point  inside  the  plate.  Then  if  C is  any  circle  with 
center  at  P that  is  completely  contained  in  the  plate,  the  temperature  at  P is  the  average  value  of  the 
temperature  on  the  circle  (Figure  10.11.2). 


This  property  is  a consequence  of  certain  basic  laws  of  molecular  motion,  and  we  will  not  attempt  to  derive  it. 
Basically,  this  property  states  that  in  equilibrium,  thermal  energy  tends  to  distribute  itself  as  evenly  as  possible 
consistent  with  the  boundary  conditions.  It  can  be  shown  that  the  mean-value  property  uniquely  determines  the 
equilibrium  temperature  distribution  of  a plate. 

Unfortunately,  determining  the  equilibrium  temperature  distribution  from  the  mean-value  property  is  not  an  easy 
matter.  However,  if  we  restrict  ourselves  to  finding  the  temperature  only  at  a finite  set  of  points  within  the  plate, 
the  problem  can  be  reduced  to  solving  a linear  system.  We  pursue  this  idea  next. 


Discrete  Formulation  of  the  Problem 


We  can  overlay  our  trapezoidal  plate  with  a succession  of  finer  and  finer  square  nets  or  meshes  (Figure  10.11.3).  In 
(a)  we  have  a rather  coarse  net;  in  ( b ) we  have  a net  with  half  the  spacing  as  in  (a);  and  in  ( c ) we  have  a net  with 
the  spacing  again  reduced  by  half.  The  points  of  intersection  of  the  net  lines  are  called  mesh  points.  We  classify 
them  as  boundary  mesh  points  if  they  fall  on  the  boundary  of  the  plate  or  as  interior  mesh  points  if  they  lie  in  the 
interior  of  the  plate.  For  the  three  net  spacings  we  have  chosen,  there  are  1,  9,  and  49  interior  mesh  points, 
respectively. 
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(a)  1 interior  mesh  point  (b) 

Figure  10.11.3 
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(c)  49  interior  mesh  points 


In  the  discrete  formulation  of  our  problem,  we  try  to  find  the  temperature  only  at  the  interior  mesh  points  of  some 
particular  net.  For  a rather  fine  net,  as  in  (c),  this  will  provide  an  excellent  picture  of  the  temperature  distribution 
throughout  the  entire  plate. 

At  the  boundary  mesh  points,  the  temperature  is  given  by  the  boundary  data.  (In  Figure  10.11.3  we  have  labeled  all 
the  boundary  mesh  points  with  their  corresponding  temperatures.)  At  the  interior  mesh  points,  we  will  apply  the 
following  discrete  version  of  the  mean- value  property. 


Discrete  Mean-Value  Property 

At  each  interior  mesh  point,  the  temperature  is  approximately  the  average  of  the  temperatures  at  the  four 
neighboring  mesh  points. 


This  discrete  version  is  a reasonable  approximation  to  the  true  mean- value  property.  But  because  it  is  only  an 
approximation,  it  will  provide  only  an  approximation  to  the  true  temperatures  at  the  interior  mesh  points.  However, 
the  approximations  will  get  better  as  the  mesh  spacing  decreases.  In  fact,  as  the  mesh  spacing  approaches  zero,  the 
approximations  approach  the  exact  temperature  distribution,  a fact  proved  in  advanced  courses  in  numerical 
analysis.  We  will  illustrate  this  convergence  by  computing  the  approximate  temperatures  at  the  mesh  points  for  the 
three  mesh  spacings  given  in  Figure  10.11.3. 


Case  (a)  of  Figure  10.11.3  is  simple,  for  there  is  only  one  interior  mesh  point.  If  we  let  t q be  the  temperature  at  this 


mesh  point,  the  discrete  mean- value  property  immediately  gives 

*0  = i(2  + 1 + 0 + 0)  = 0.75 

In  case  ( b ) we  can  label  the  temperatures  at  the  nine  interior  mesh  points  t\, *2>  *9,  as  in  Figure  10.11.36.  (The 

particular  ordering  is  not  important.)  By  applying  the  discrete  mean- value  property  successively  to  each  of  these 
nine  mesh  points,  we  obtain  the  following  nine  equations: 

t\  = ^(*2  + 2 + 0 + 0) 

*2  = ^(*1  + *3  + *4+  2) 

*3  = —(*2  +*5  + 0 + 0) 

*4=  -i(*2  +*5  + ^7  + 2) 

*5  = ^(*3+ *4  + *6+ *8)  (1) 

t(,  = ^(*5  +*9  + 0 + 0) 
t-]  = ^-(£4  + ig  +1+2) 
t%  = ^-(^+*7+^9  + 1) 
l9  = ^(^6  + *8  + 1 + 0) 

This  is  a system  of  nine  linear  equations  in  nine  unknowns.  We  can  rewrite  it  in  matrix  form  as 


where 


t = Mt  + b 


t = 


h 

*2 

*3 

1 4 

*6 

*7 

*8 
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0 

i 

4 

0 

0 


M = 
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0 
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(2) 


To  solve  Equation  2,  we  write  it  as 


The  solution  for  t is  thus 


(/  — M)t  = b 


t=  (/-M)_1b 


(3) 


as  long  as  the  matrix  (/  — M)  is  invertible.  This  is  indeed  the  case,  and  the  solution  for  t as  calculated  by  3 is 

0.7846 
1.1383 
0.4719 
1.2967 

t=  0.7491  (4) 

0.3265 
1.2995 
0.9014 
0.5570 

Figure  10.11.4  is  a diagram  of  the  plate  with  the  nine  interior  mesh  points  labeled  with  their  temperatures  as  given 
by  this  solution. 
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Figure  10.11.4 

For  case  ( c ) of  Figure  10.11.3,  we  repeat  this  same  procedure.  We  label  the  temperatures  at  the  49  interior  mesh 
points  as  t\,  £49  in  some  manner.  For  example,  we  may  begin  at  the  top  of  the  plate  and  proceed  from  left  to 

right  along  each  row  of  mesh  points.  Applying  the  discrete  mean- value  property  to  each  mesh  point  gives  a system 
of  49  linear  equations  in  49  unknowns: 


(5) 


h = jte  + 2 + 0 + 0) 
t2  = |(*l+*3  + *4  + 2) 

^48  = ^(^41+^47+^49+1) 

^49  = ^'(■^42  + ^48  + 0+1) 

In  matrix  form,  Equations  5 are 


t = Mt  + b 

where  t and  b are  column  vectors  with  49  entries,  and  Mis  a 49  x 49  matrix.  As  in  3,  the  solution  for  t is 


t=  (/-ji/)_1b 


(6) 


In  Figure  10.11.5  we  display  the  temperatures  at  the  49  mesh  points  found  by  Equation  6.  The  nine  unshaded 
temperatures  in  this  figure  fall  on  the  mesh  points  of  Figure  10.11.4. 
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Figure  10.11.5 


In  Table  1 we  compare  the  temperatures  at  these  nine  common  mesh  points  for  the  three  different  mesh  spacings 
used. 


Table  1 


Temperature*  at  Common 
Mesh  Points 

Case  (tf) 

Case  ( b ) 

Case  (c) 

h 



0.7846 

0.8048 

/> 

— 

1.1383 

1.1533 

b 

— 

0.4719 

0.4778 

b 

— 

1.2967 

1.3078 

1 5 

0.7500 

0.7491 

0.7513 

'6 

— 

0.3265 

0.3157 

h 

— 

1.2995 

1.3042 

h 

— 

0.9014 

0.9032 

b 

— 

0.5570 

0.5554 

Knowing  that  the  temperatures  of  the  discrete  problem  approach  the  exact  temperatures  as  the  mesh  spacing 
decreases,  we  may  surmise  that  the  nine  temperatures  obtained  in  case  (c)  are  closer  to  the  exact  values  than  those 
in  case  (b). 


A Numerical  Technique 

To  obtain  the  49  temperatures  in  case  ( c ) of  Figure  10.11.3,  it  was  necessary  to  solve  a linear  system  with  49 
unknowns.  A finer  net  might  involve  a linear  system  with  hundreds  or  even  thousands  of  unknowns.  Exact 
algorithms  for  the  solutions  of  such  large  systems  are  impractical,  and  for  this  reason  we  now  discuss  a numerical 
technique  for  the  practical  solution  of  these  systems. 

To  describe  this  technique,  we  look  again  at  Equation  2: 

t = Mt  -\  b (7) 

The  vector  t we  are  seeking  appears  on  both  sides  of  this  equation.  We  consider  a way  of  generating  better  and 
better  approximations  to  the  vector  solution  t.  For  the  initial  approximation  we  can  take  = 0 if  no  better 
choice  is  available.  If  we  substitute  into  the  right  side  of  7 and  label  the  resulting  left  side  as  t(0  , we  have 


tV  = Mt<®  + b 


(8) 


If  we  substitute  t1 1 1 into  the  right  side  of  7,  we  generate  another  approximation,  which  we  label  t'  A1: 


t®  = Mt(1)  + b 


(9) 


Continuing  in  this  way,  we  generate  a sequence  of  approximations  as  follows: 


t®  = Mt®  + b 

t®  = Mt®  + b 

t®  = Mt®4b  (10) 

t(»)  = + h 

One  would  hope  that  this  sequence  of  approximations  t®,  t®,  t®, ...  converges  to  the  exact  solution  of  7.  We  do 

not  have  the  space  here  to  go  into  the  theoretical  considerations  necessary  to  show  this.  Suffice  it  to  say  that  for  the 
particular  problem  we  are  considering,  the  sequence  converges  to  the  exact  solution  for  any  mesh  size  and  for  any 
initial  approximation  t®. 


This  technique  of  generating  successive  approximations  to  the  solution  of  7 is  a variation  of  a technique  called 
Jacobi  iteration ; the  approximations  themselves  are  called  iterates.  As  a numerical  example,  let  us  apply  Jacobi 
iteration  to  the  calculation  of  the  nine  mesh  point  temperatures  of  case  (b).  Setting  j®  = 0,  we  have,  from 
Equation  2, 
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.5000 

.5000 

.6250 

.5000 

.5000 

.7500 

.0000 

.0000 

.1250 

.5000 

.5000 

.8125 

.0000 

+ 

.0000 

= 

.1875 

.0000 

.0000 

.0625 

.7500 

.7500 

.9375 

.2500 

.2500 

.5000 

.2500 

.2500 

.3125 

Some  additional  iterates  are 


0.6875 

0.7791 

0.7845 

0.7846 

0.8906 

1.1230 

1.1380 

1.1383 

0.2344 

0.4573 

0.4716 

0.4719 

0.9688 

tGCD  = 

1.2770 

tW»  = 

1.2963 

t^  = 

1.2967 

0.3750 

0.7236 

0.7486 

0.7491 

0.1250 

0.3131 

0.3263 

0.3265 

1.0781 

1.2848 

1.2992 

1.2995 

0.6094 

0.8827 

0.9010 

0.9014 

0.3906 

0.5446 

0.5567 

0.5570 

All  iterates  beginning  with  the  thirtieth  are  equal  to  t1  JlU  1 to  four  decimal  places.  Consequently,  is  the  exact 
solution  to  four  decimal  places.  This  agrees  with  our  previous  result  given  in  Equation  4. 

The  Jacobi  iteration  scheme  applied  to  the  linear  system  5 with  49  unknowns  produces  iterates  that  begin  repeating 
to  four  decimal  places  after  119  iterations.  Thus,  would  provide  the  49  temperatures  of  case  (c)  correct  to 
four  decimal  places. 


A Monte  Carlo  Technique 

In  this  section  we  describe  a so-called  Monte  Carlo  technique  for  computing  the  temperature  at  a single  interior 
mesh  point  of  the  discrete  problem  without  having  to  compute  the  temperatures  at  the  remaining  interior  mesh 
points.  First  we  define  a discrete  random  walk  along  the  net.  By  this  we  mean  a directed  path  along  the  net  lines 
(Figure  10.11.6)  that  joins  a succession  of  mesh  points  such  that  the  direction  of  departure  from  each  mesh  point  is 
chosen  at  random.  Each  of  the  four  possible  directions  of  departure  from  each  mesh  point  along  the  path  is  to  be 
equally  probable. 
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Figure  10.11.6 

By  the  use  of  random  walks,  we  can  compute  the  temperature  at  a specified  interior  mesh  point  on  the  basis  of  the 
following  property. 


Random  Walk  Property 


Let  W\,  W2,  Wn  be  a succession  of  random  walks,  all  of  which  begin  at  a specified  interior  mesh  point. 
Let  t * 7 1 * y . . t * be  the  temperatures  at  the  boundary  mesh  points  first  encountered  along  each  of  these 
random  walks.  Then  the  average  value  (£*  f- 1 * } . . . f t *)  / n of  these  boundary  temperatures  approaches 
the  temperature  at  the  specified  interior  mesh  point  as  the  number  of  random  walks  n increases  without 
bound. 


This  property  is  a consequence  of  the  discrete  mean- value  property  that  the  mesh  point  temperatures  satisfy.  The 
proof  of  the  random  walk  property  involves  elementary  concepts  from  probability  theory,  and  we  will  not  give  it 
here. 


In  Table  2 we  display  the  results  of  a large  number  of  computer-generated  random  walks  for  the  evaluation  of  the 
temperature  of  the  nine-point  mesh  of  case  ( b ) in  Figure  10.11.6.  The  first  column  lists  the  number  n of  the 

random  walk.  The  second  column  lists  the  temperature  1 * of  the  boundary  point  first  encountered  along  the 
corresponding  random  walk.  The  last  column  contains  the  cumulative  average  of  the  boundary  temperatures 
encountered  along  the  n random  walks.  Thus,  after  1000  random  walks  we  have  the  approximation  ~ .7550. 
This  compares  with  the  exact  value  ^ = .7491  that  we  had  previously  evaluated.  As  can  be  seen,  the  convergence 
to  the  exact  value  is  not  too  rapid. 

Table  2 


n 

(f\+-  + 0/n 

20 

1 

0.9500 

30 

0 

0.8000 

40 

0 

0.8250 

50 

2 

0.8400 

100 

0 

0.8300 

150 

1 

0.8000 

200 

0 

0.8050 

250 

1 

0.8240 

500 

1 

0.7860 

1000 

0 

0.7550 

n 

(/*lH + /*,)//! 

1 

1 

1.0000 

2 

1.5000 

3 

1 

1.3333 

4 

0 

1.0000 

5 

-> 

1.2000 

6 

0 

1.0000 

7 

2 

1.1429 

8 

0 

1.0000 

9 

1.1111 

10 

0 

1.0000 

Exercise  Set  10.11 

1.  A plate  in  the  form  of  a circular  disk  has  boundary  temperatures  of  0°  on  the  left  of  its  circumference  and  ] 0 on 
the  right  half  of  its  circumference.  A net  with  four  interior  mesh  points  is  overlaid  on  the  disk  (see  Figure 
Ex-1). 

(a)  Using  the  discrete  mean- value  property,  write  the  4 x 4 linear  system  t = Mt  + b that  determines  the 
approximate  temperatures  at  the  four  interior  mesh  points. 

(b)  Solve  the  linear  system  in  part  (a). 

(c)  Use  the  Jacobi  iteration  scheme  with  t{P)  — 0 to  generate  the  iterates  t1^,  , t^,  and  t1- 1 for  the 

linear  system  in  part  (a).  What  is  the  “error  vector”  ^ where  t is  the  solution  found  in  part  (b)? 


(d)  By  certain  advanced  methods,  it  can  be  determined  that  the  exact  temperatures  to  four  decimal  places  at  the 
four  mesh  points  are  ^ = ^3  = . 287 1 and  ^2  = ^4  = .7129.  What  are  the  percentage  errors  in  the  values 
found  in  part  (b)? 


Figure  Ex-1 


Answer: 


(a) 


h 

*■2 

£3 

1 4 


(b) 


t = 


(c) 


1 I 

4 4 


0 4 4 0 


4004 


4 0 0 4 


I 1 

4 4 


o44o 


r _ 

0 

h 

1 

*2 

2 

+ 

*3 

0 

[u 

1 

2 

V 

’ 3 ' 

7 

15~ 

1 " 

0 

8 

16 

32 

64 

64 

1 

5 

11 

23 

47 

1 

2 

. = 

8 

, = 

16 

.&  = 

32 

. = 

64 

, t®-t  = 

64 

0 

1 

3 

7 

15 

1 

1 

8 

16 

32 

64 

64 

2 

5 

11 

23 

47 

1 

8 

16 

32 

64 

64 

(d)  for  1 1 and  £3,  —12.9%;  for  £2  and  t4,  5 2% 

2.  Use  Theorem  1 0. 1 1 . 1 to  find  the  exact  equilibrium  temperature  at  the  center  of  the  disk  in  Exercise  1 . 


Answer: 


1 

2 

3.  Calculate  the  first  two  iterates  jC1)  and  for  case  ( b ) of  Figure  10. 1 1 .3  with  nine  interior  mesh  points 

[Equation  2]  when  the  initial  iterate  is  chosen  as 

t®=[l  1 1 1 1 1 1 1 l]r 


Answer: 


r(l) 


.(2) 


[3 

5 2 

5 

4 

2 

5 

4 

3] 

T 

4 4 

4 

4 

4 

4 

4 

4 _ 

r 13 

18 

9 

22 

13 

7 

21 

16 

10  ' 

[\6 

16 

16 

16 

16 

16 

16 

16 

16  _ 

4.  The  random  walk  illustrated  in  Figure  Ex-4  a can  be  described  by  six  arrows 

«-  1 — — T — 

that  specify  the  directions  of  departure  from  the  successive  mesh  points  along  the  path.  Figure  Ex-4 b is  an  array 

of  100  computer-generated,  randomly  oriented  arrows  arranged  in  a x 10  array.  Use  these  arrows  to 

determine  random  walks  to  approximate  the  temperature  tj,  as  in  Table  2.  Proceed  as  follows: 

1 . Take  the  last  two  digits  of  your  telephone  number.  Use  the  last  digit  to  specify  a row  and  the  other  to  specify 
a column. 

2.  Go  to  the  arrow  in  the  array  with  that  row  and  column  number. 

3.  Using  this  arrow  as  a starting  point,  move  through  the  array  of  arrows  as  you  would  read  a book  (left  to  right 
and  top  to  bottom).  Beginning  at  the  point  labeled  in  Figure  Ex-4  a and  using  this  sequence  of  arrows  to 
specify  a sequence  of  directions,  move  from  mesh  point  to  mesh  point  until  you  reach  a boundary  mesh 
point.  This  completes  your  first  random  walk.  Record  the  temperature  at  the  boundary  mesh  point.  (If  you 
reach  the  end  of  the  arrow  array,  continue  with  the  arrow  in  the  upper  left  comer.) 

4.  Return  to  the  interior  mesh  point  labeled  and  begin  where  you  left  off  in  the  arrow  array;  generate  your 

next  random  walk.  Repeat  this  process  until  you  have  completed  10  random  walks  and  have  recorded  10 
boundary  temperatures. 

5.  Calculate  the  average  of  the  10  boundary  temperatures  recorded.  (The  exact  value  is  = .7491.) 
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Figure  Ex-4 


Technology  Exercises 


The  following  exercises  are  designed  to  be  solved  using  a technology  utility.  Typically,  this  will  be  MATLAB, 
Mathematica,  Maple,  Derive,  or  Mathcad,  but  it  may  also  be  some  other  type  of  linear  algebra  software  or  a 
scientific  calculator  with  some  linear  algebra  capabilities.  For  each  exercise  you  will  need  to  read  the  relevant 
documentation  for  the  particular  utility  you  are  using.  The  goal  of  these  exercises  is  to  provide  you  with  a basic 
proficiency  with  your  technology  utility.  Once  you  have  mastered  the  techniques  in  these  exercises,  you  will  be 
able  to  use  your  technology  utility  to  solve  many  of  the  problems  in  the  regular  exercise  sets. 


Tl.  Suppose  that  we  have  the  square  region  described  by 

R=  { (x,  >>)  |0  < x <1,0  <.y  < 1} 

and  suppose  that  the  equilibrium  temperature  distribution  u(x,  y)  along  the  boundary  is  given  by  «(x,  0)  = T p, 
u(x,  1)  = Ty,  u(0,  y)  = 7’r,  and  u ( 1 , y ) = Tp:  Suppose  next  that  this  region  is  partitioned  into  an 
(«  + 1)  x («  + 1)  mesh  using 

x,  = — and  v,  = — 

‘ft  •'j  n 

for  i = 0,  1,2, ....  ft  and  j = 0,  1,  2, ...,  ft.  If  the  temperatures  of  the  interior  mesh  points  are  labeled  by 

Ujj  = u(xj,  yj ) =u(i  I ft,  j I ft) 

then  show  that 


ui,j  = + ui+l,/  + ui,j- 1 + ui,J+ 1) 

for  i = 1,  2,  3, — 1 and  j = 1,  2,  3 ft  — 1.  To  handle  the  boundary  points,  define 

«0  ,j  = tL,  Un,j  = TR,  uu  o = Tb,  and  uit»  = TT 

fori  = 1,  2,  3, ...,  ft  — 1 and  J = 1,2,  3, ....  ft  — 1.  Next  let 

"0  L 


■^H  + l — 


1 0 


be  the  («  + l)x(»  + l)  matrix  with  the  nxtt  identity  matrix  in  the  upper  right-hand  comer,  a one  in  the  lower 
left-hand  comer,  and  zeros  everywhere  else.  For  example, 


o 

o 

0 1 
1 0 

^3  = 

0 0 1 

1 

o 

o 

F 4 = 


0 10  0 
0 0 10 
0 0 0 1’ 
10  0 0 


1 

0 

0 

0 

0 


0 0 
1 0 
0 1 
0 0 
0 0 


0 

0 

0 

1 

0 


and  so  on.  By  defining  the  («  + l)x(«  + l)  matrix 


Mn+ 1 =Fn+ 1 = 


0 /„ 
1 0 


+ 


0 

1 


In 

0 


lT 


show  that  if  Un+\  is  the  («  + 1)  x («  + 1)  matrix  with  entries  uij.  then  the  set  of  equations 


(u>— 1J  + ui+l,j  + ui,j- 1 + uiJ+l) 


for  i = 1,  2,  3, — 1 and  j = 1,  2,  3, — 1 can  be  written  as  the  matrix  equation 

£^w+l  = ^(^M  + l^M  + 1 + ^W+l^W+l) 

where  we  consider  only  those  elements  of  with  i = 1,  2,  3, — 1 and  y = 1,  2,  3, — 1. 

T2.  The  results  of  the  preceding  exercise  and  the  discussion  in  the  text  suggest  the  following  algorithm  for  solving 
for  the  equilibrium  temperature  in  the  square  region 

R=  {(x,y)\Q<x<l,Q<y<\) 

given  the  boundary  conditions 

u(x,  0)  = Tb,  u(x,  1)  = 7Y, 
u(0,y)  = TL,  u(\,y)  = TR 


1 . Choose  a value  for  n , and  then  choose  an  initial  guess,  say 


U®  - 
u*+l  “ 


0 

tl  ... 

Tl 

0 

Tb 

0 ... 

0 

Tr 

Tb 

0 ... 

0 

Tr 

0 

Tr  ... 

Tr 

0 

r(*+l) 

H + l 

using 

(k)  tXK> 

^n- hi  ^(^w+I^'m+1  * l^w+l) 

where  is  as  defined  in  Exercise  T1  . Then  adjust  ^ 1 by  replacing  all  edge  entries  by  the  initial  edge 

entries  in  jy®  ^ . [Note:  The  edge  entries  of  a matrix  are  the  entries  in  the  first  and  last  columns  and  first  and 
last  rows.] 

3.  Continue  this  process  until  — U^_  j is  approximately  the  zero  matrix.  This  suggests  that 

u»+i  = > 

ft— hx> 


Use  a computer  and  this  algorithm  to  solve  for  u{x,  y)  given  that 

wO,0)=0,  u(x,  1)  = 0,  u(0,7)  = 0,  u(\,y)=2 

Choose  fj  = 6 and  compute  up  to  . The  exact  solution  can  be  expressed  as 

/ v 8 5^  sinh[(2^~  l)-x]sin[(2w  - l)sy] 
y,y)  (2w-l)sinh[(2w-l)ir] 

Use  a computer  to  compute  u{i  / 6,  j / 6)  for  i,  j = 0,  1,  2,  3,  4,  5,  6,  and  then  compare  your  results  to  the  values 
of  u(i/6,j/ 6)  in 

T3.  Using  the  exact  solution  u(x,  y ) for  the  temperature  distribution  described  in  Exercise  T2  , use  a graphing 
program  to  do  the  following: 

(a)  Plot  the  surface  z = u(x,  y ) in  three-dimensional  xyz-space  in  which  z is  the  temperature  at  the  point  (*,  y)  in 
the  square  region. 

(b)  Plot  several  isotherms  of  the  temperature  distribution  (curves  in  the  xy- plane  over  which  the  temperature  is  a 
constant). 

(c)  Plot  several  curves  of  the  temperature  as  a function  of  x with  y held  constant. 


(d)  Plot  several  curves  of  the  temperature  as  a function  of  y with  v held  constant. 
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10.12  Computed  Tomography 

In  this  section  we  will  see  how  constructing  a cross-sectional  view  of  a human  body  by  analyzing  X-ray  scans  leads  to  an  inconsistent  linear 
system.  We  present  an  iteration  technique  that  provides  an  “approximate  solution”  of  the  linear  system. 


Prerequisites 

Linear  Systems 
Natural  Logarithms 
Euclidean  Space  Rn 


The  basic  problem  of  computed  tomography  is  to  construct  an  image  of  a cross  section  of  the  human  body  using  data  collected  from  many 
individual  beams  of  X rays  that  are  passed  through  the  cross  section.  These  data  are  processed  by  a computer,  and  the  computed  cross  section  is 
displayed  on  a video  monitor.  Figure  10.12.1  is  a diagram  of  General  Electric's  CT  system  showing  a patient  prepared  to  have  a cross  section  of 
his  head  scanned  by  X-ray  beams. 


Figure  10.12.1 

Such  a system  is  also  known  as  a CAT  scanner , for  Computer-^ ided  Tomography  scanner.  Figure  10.12.2  shows  a typical  cross  section  of  a 
human  head  produced  by  the  system. 


Figure  10.12.2 

The  first  commercial  system  of  computed  tomography  for  medical  use  was  developed  in  1971  by  G.  N.  Hounsfield  of  EMI,  Ltd.,  in  England.  In 
1979,  Houndsfield  and  A.  M.  Cormack  were  awarded  the  Nobel  Prize  for  their  pioneering  work  in  the  field.  As  we  will  see  in  this  section,  the 
construction  of  a cross  section,  or  tomograph,  requires  the  solution  of  a large  linear  system  of  equations.  Certain  algorithms,  called  algebraic 
reconstruction  techniques  (ARTs),  can  be  used  to  solve  these  linear  systems,  whose  solutions  yield  the  cross  sections  in  digital  form. 


Scanning  Modes 

Unlike  conventional  X-ray  pictures  that  are  formed  by  X rays  that  are  projected  perpendicular  to  the  plane  of  the  picture,  tomographs  are 
constructed  from  thousands  of  individual,  hairline-thin  X-ray  beams  that  lie  in  the  plane  of  the  cross  section.  After  they  pass  through  the  cross 
section,  the  intensities  of  the  X-ray  beams  are  measured  by  an  X-ray  detector,  and  these  measurements  are  relayed  to  a computer  where  they  are 


processed.  Figures  10.12.3  and  10.12.4  illustrate  two  possible  modes  of  scanning  the  cross  section:  the  parallel  mode  and  the  fan-beam  mode. 

In  the  parallel  mode  a single  X-ray  source  and  X-ray  detector  pair  are  translated  across  the  field  of  view  containing  the  cross  section,  and  many 
measurements  of  the  parallel  beams  are  recorded.  Then  the  source  and  detector  pair  are  rotated  through  a small  angle,  and  another  set  of 
measurements  is  taken.  This  is  repeated  until  the  desired  number  of  beam  measurements  is  completed.  For  example,  in  the  original  1971 
machine,  160  parallel  measurements  were  taken  through  180  angles  spaced  1°  apart:  a total  of  160  x 180  = 28,  800  beam  measurements.  Each 


X-ray 

detector 


source 


Figure  10.12.3 


In  the  fan-beam  mode  of  scanning,  a single  X-ray  tube  generates  a fan  of  collimated  beams  whose  intensities  are  measured  simultaneously  by 
an  array  of  detectors  on  the  other  side  of  the  field  of  view.  The  X-ray  tube  and  detector  array  are  rotated  through  many  angles,  and  a set  of 
measurements  is  taken  at  each  angle  until  the  scan  is  completed.  In  the  General  Electric  CT  system,  which  uses  the  fan-beam  mode,  each  scan 
takes  1 second. 


Derivation  of  Equations 

To  see  how  the  cross  section  is  reconstructed  from  the  many  individual  beam  measurements,  refer  to  Figure  10.12.5.  Here  the  field  of  view  in 
which  the  cross  section  is  situated  has  been  divided  into  many  square  pixels  (picture  elements)  numbered  1 through  N as  indicated.  It  is  our 
desire  to  determine  the  X-ray  density  of  each  pixel.  In  the  EMI  system,  6400  pixels  were  used,  arranged  in  a square  80x80  array.  The  G.E.  CT 
system  uses  262,144  pixels  ina512x512  array,  each  pixel  being  about  1 mm  on  a side.  After  the  densities  of  the  pixels  are  determined  by  the 
method  we  will  describe,  they  are  reproduced  on  a video  monitor,  with  each  pixel  shaded  a level  of  gray  proportional  to  its  X-ray  density. 
Because  different  tissues  within  the  human  body  have  different  X-ray  densities,  the  video  display  clearly  distinguishes  the  various  tissues  and 
organs  within  the  cross  section. 


Figure  10.12.6  shows  a single  pixel  with  an  X-ray  beam  of  roughly  the  same  width  as  the  pixel  passing  squarely  through  it.  The  photons 
constituting  the  X-ray  beam  are  absorbed  by  the  tissue  within  the  pixel  at  a rate  proportional  to  the  X-ray  density  of  the  tissue.  Quantitatively, 
the  X-ray  density  of  the  yth  pixel  is  denoted  by  xj  and  is  defined  by 

_j  / number  of  photons  entering  the  jth  pixel 
J \ number  of  photons  leaving  the  jth  pixel 

where  “In”  denotes  the  natural  logarithmic  function.  Using  the  logarithm  property  ln(a  / b)  = — ln(&  / a),  we  also  have 

/ fraction  of  photons  that  pass  through 
J l the  jth  pixel  without  being  absorbed 


Photons  entering 
/’th  pixel 


Photons  leaving 
/th  pixel 


Figure  10.12.6 

If  the  X-ray  beam  passes  through  an  entire  row  of  pixels  (Figure  10.12.7),  then  the  number  of  photons  leaving  one  pixel  is  equal  to  the  number 
of  photons  entering  the  next  pixel  in  the  row.  If  the  pixels  are  numbered  1,  2, then  the  additive  property  of  the  logarithmic  function  gives 


*1  +*2  + 


/ number  of  photons  entering  the  first  pixel  \ 
\ number  of  photons  leaving  the  nth  pixel  J 


= — In 


^ fraction  of  photons  that  pass ^ 
through  the  row  of  n pixels 
without  being  absorbed 


(1) 


/ 


Thus,  to  determine  the  total  X-ray  density  of  a row  of  pixels,  we  simply  sum  the  individual  pixel  densities. 


Figure  10.12.7 


Next,  consider  the  X-ray  beam  in  Figure  10.12.5.  By  the  beam  density  of  the  /th  beam  of  a scan,  denoted  by  bj,  we  mean 


{ number  of -photons  of  the  ith  beam  entering  the  detector  1 
without  the  cross  section  in  the  field  of  view 
number  of  photons  of  the  ith  beam  entering  the  detector 
i with  the  cross  section  in  the  field  of  view  i 


= -In 


^ fraction  of  photons  of  the  ith  beam  that  ^ 
pass  through  the  cross  section  without 
being  absorbed  I 


(2) 


The  numerator  in  the  first  expression  for  b 2-  is  obtained  by  performing  a calibration  scan  without  the  cross  section  in  the  field  of  view.  The 
resulting  detector  measurements  are  stored  within  the  computer's  memory.  Then  a clinical  scan  is  performed  with  the  cross  section  in  the  field 
of  view,  the  b{ s of  all  the  beams  constituting  the  scan  are  computed,  and  the  values  are  stored  for  further  processing. 


For  each  beam  that  passes  squarely  through  a row  of  pixels,  we  must  have 

/ fraction  of  photons  of  the  \ 
beam  that  pass  through  the 
row  of  pixels  without  being 
absorbed  / 

^ J \ i 

Thus,  if  the  z'th  beam  passes  squarely  through  a row  of  n pixels,  then  it  follows  from  Equations  1 and  2 that 

In  this  equation,  bj  is  known  from  the  clinical  and  calibration  measurements,  and  xj,  *2,  ---»  are  unknown  pixel  densities  that  must  be 
determined. 


j fraction  of  photons  of  the  \ 
beam  that  pass  through  the 
cross  section  without  being 
absorbed 


More  generally,  if  the  z'th  beam  passes  squarely  through  a row  (or  column)  of  pixels  with  numbers  j\,  j2 ji,  then  we  have 

*Jl+*i2  + ~ + *h  = bt 


If  we  set 


then  we  can  write  this  equation  as 


£ j = ju  J2 Ji 

otherwise 


ai\x  1 + ai2x2  + - + *iNxN  = *>i 


(3) 


We  will  refer  to  Equation  3 as  the  ith  beam  equation. 

Referring  to  Figure  10.12.5,  however,  we  see  that  the  beams  of  a scan  do  not  necessarily  pass  through  a row  or  column  of  pixels  squarely. 
Instead,  a typical  beam  passes  diagonally  through  each  pixel  in  its  path.  There  are  many  ways  to  take  this  into  account.  In  Figure  10.12.8  we 
outline  three  methods  of  defining  the  quantities  aij  that  appear  in  Equation  3,  each  of  which  reduces  to  our  previous  definition  when  the  beam 
passes  squarely  through  a row  or  column  of  pixels.  Reading  down  the  figure,  each  method  is  more  exact  than  its  predecessor,  but  with 
successively  more  computational  difficulty. 


Using  any  one  of  the  three  methods  to  define  the  *2/  s in  the  zth  beam  equation,  we  can  write  the  set  of  M beam  equations  in  a complete  scan  as 


*11*  1 

4= 

*12*2 

+...+ 

= A 

*21*1 

4= 

*22*2 

+...  4= 

<*2NXN 

= h 

*M1*1 

4= 

*M2*2 

+...+ 

aMNxN 

= 

In  this  way  we  have  a linear  system  of  M equations  (the  M beam  equations)  in  N unknowns  (the  N pixel  densities). 


Depending  on  the  number  of  beams  and  pixels  used,  we  can  have  M M = or  M < N-  We  will  consider  only  the  case  M > N,  the 
so-called  overdetermined  case , in  which  there  are  more  beams  in  the  scan  than  pixels  in  the  field  of  view.  Because  of  inherent  modeling  and 
experimental  errors  in  the  problem,  we  should  not  expect  our  linear  system  to  have  an  exact  mathematical  solution  for  the  pixel  densities.  In  the 
next  section  we  attempt  to  find  an  “approximate”  solution  to  this  linear  system. 


Algebraic  Reconstruction  Techniques 

There  have  been  many  mathematical  algorithms  devised  to  treat  the  overdetermined  linear  system  4.  The  one  we  will  describe  belongs  to  the 
class  of  so-called  Algebraic  Reconstruction  Techniques  (ARTs).  This  method,  which  can  be  traced  to  an  iterative  technique  originally 
introduced  by  S.  Kaczmarz  in  1937,  was  the  one  used  in  the  first  commercial  machine.  To  introduce  this  technique,  consider  the  following 
system  of  three  equations  in  two  unknowns: 


A: 

*1 

+ *2  = 

2 

£2: 

*1 

- 2X2  = 

-2 

(5) 

A: 

3*i 

“ *2  = 

3 

The  lines  L\,  Z*2>  L 3 determined  by  these  three  equations  are  plotted  in  the  *1*2 -plane.  As  shown  in  Figure  10.12.9a,  the  three  lines  do  not  have 
a common  intersection,  and  so  the  three  equations  do  not  have  an  exact  solution.  However,  the  points  (*i,  *2)  on  the  shaded  triangle  formed  by 
the  three  lines  are  all  situated  “near”  these  three  lines  and  can  be  thought  of  as  constituting  “approximate”  solutions  to  our  system.  The 
following  iterative  procedure  describes  a geometric  construction  for  generating  points  on  the  boundary  of  that  triangular  region  (Figure 
10.12.9Zz): 

Algorithm  1 

Step  0 Choose  an  arbitrary  starting  point  xq  in  the  x i*2-plane. 

Step  1 Project  xq  orthogonally  onto  the  first  line  L\  and  call  the  projection  x^ . The  superscript  1 indicates  that  this  is  the  first  of  several 
cycles  through  the  steps. 

Step  2 Project  orthogonally  onto  the  second  line  £2  and  call  the  projection  x^ . 


Step  3 Project  x®  orthogonally  onto  the  third  line  £3  and  call  the  projection  x® . 

Step  4 Take  x^  as  the  new  value  of  xq  and  cycle  through  Steps  1 through  3 again.  In  the  second  cycle,  label  the  projected  points  x®, 
x® ; in  the  third  cycle,  label  the  projected  points  x^ , x!p , x^ ; and  so  forth. 

This  algorithm  generates  three  sequences  of  points 


T-i . x®  x®  Y® 

h\.  Xi  , Xi  , Xi  , 


Lr 

w 


Y©  T® 

X2  ,X2  ,X2  , 

JO  Y©  Y® 

X3  ,X3  ,X3  , 


that  lie  on  the  three  lines  L\,  £2 , and  £3,  respectively.  It  can  be  shown  that  as  long  as  the  three  lines  are  not  all  parallel,  then  the  first  sequence 
converges  to  a point  Xj  on  L\,  the  second  sequence  converges  to  a point  x-,  on  £2,  and  the  third  sequence  converges  to  a point  x3  on  £3  (Figure 
10.12.9c).  These  three  limit  points  form  what  is  called  the  limit  cycle  of  the  iterative  process.  It  can  be  shown  that  the  limit  cycle  is  independent 
of  the  starting  point  xq. 


,x2 

a*j-ofc-3 

In 

jfi  + X2 = 2 

xx-lx^-2 


*3 

(<o 


L 


l 


Figure  10.12.9 


Next  we  discuss  the  specific  formulas  needed  to  effect  the  orthogonal  projections  in  Algorithm  1.  First,  because  the  equation  of  a line  in  x \X2 


-space  is 

we  can  express  it  in  vector  form  as 
where 


+ &2X2  = & 


-[S] 


aJx  = & 


and 


-K] 


The  following  theorem  gives  the  necessary  projection  formula  (Exercise  5). 


Orthogonal  Projection  Formula 


Let  L be  a line  in  with  equation  a^x  = b,  and  let  x*  be  any  point  in  (Figure  10. 12. 10).  Then  the  orthogonal  projection,  x^, , of 
x*  onto  L is  given  by 


* (i>-arx*) 

Xp=x  4 — = —a 

a a 


Figure  10.12.10 


EXAMPLE  1 Using  Algorithm  1 ◄ 

We  can  use  Algorithm  1 to  find  an  approximate  solution  of  the  linear  system  given  in  5 and  illustrated  in  Figure  10.12.9.  If  we 
write  the  equations  of  the  three  lines  as 

T 

L i : aj  x = & i 

Lx  ajx  = Z>2 


T 

Ly.  a3x  = &3 


r*1] 

V 

f 

3" 

14 

_i_ 

> a2  = 

2_ 

- a3  = 

where 


bi=  2,  b2=  -2,  b3  = 3 

then,  using  Theorem  10.12.1,  we  can  express  the  iteration  scheme  in  Algorithm  1 as 

t(p) 


(p) 


“ Xfc-1  + T k’ 


T 

a*a* 


k=  1,2,3 


where  p = 1 for  the  first  cycle  of  iterates,  p = 2 for  the  second  cycle  of  iterates,  and  so  forth.  After  each  cycle  of  iterates  (i.e., 
after  1 is  computed),  the  next  cycle  of  iterates  is  begun  with  + ^ set  equal  to  X^'  'J  • 


Table  1 gives  the  numerical  results  of  six  cycles  of  iterations  starting  with  the  initial  point  xg  = (1,  3). 


Table  1 


*1 

X2 

xo 

1.00000 

3.00000 

*1" 

.00000 

2.00000 

xi" 

.40000 

1.20000 

1.30000 

.90000 

1.20000 

.80000 

X?' 

.88000 

1.44000 

*?' 

1.42000 

1.26000 

*(}) 

*1 

1.08000 

.92000 

xi^' 

.83200 

1.41600 

1.40800 

1.22400 

*i41 

1.09200 

.90800 

x£" 

.83680 

1.41840 

v(4) 

1.40920 

1.22760 

1.09080 

.90920 

xf 

.83632 

1.41816 

«<5) 

*3 

1.40908 

1.22724 

1.09092 

.90908 

x^6) 

.83637 

1.41818 

*<6) 

1.40909 

1.22728 

Using  certain  techniques  that  are  impractical  for  large  linear  systems,  we  can  show  the  exact  values  of  the  points  of  the  limit  cycle 
in  this  example  to  be 


x*  = 12.  j = (1.09090 90909...) 

X2  = (55 ' 55)  = ( 83636--  141818...) 
*3  = (§j,  §■)  = (1.40909..,  1.22727...) 


It  can  be  seen  that  the  sixth  cycle  of  iterates  provides  an  excellent  approximation  to  the  limit  cycle.  Any  one  of  the  three  iterates 
x® , X;'J,  x®  can  be  used  as  an  approximate  solution  of  the  linear  system.  (The  large  discrepancies  in  the  values  of  x®  x®  and 

1 _ ^ ->  1 z 

x^  are  due  to  the  artificial  nature  of  this  illustrative  example.  In  practical  problems,  these  discrepancies  would  be  much  smaller. 


To  generalize  Algorithm  1 so  that  it  applies  to  an  overdetermined  system  of  M equations  in  N unknowns, 


*11*1 

+ 

*12*2 

+...  4= 

= b\ 

*21*1 

+ 

*22*2 

+...+ 

a2NxN 

= h 

*M1*1 

4= 

*M2*2 

+...  + 

aMNxN 

ll 

we  introduce  column  vectors  x and  a2  as  follows: 


"*1 " 

"*2l  " 

*2 

*i2 

X = 

xN 

a2  = 

aiN 

(6) 


With  these  vectors,  the  M equations  constituting  our  linear  system  6 can  be  written  in  vector  form  as 

a[x  = bj,  i = 

Each  of  these  M equations  defines  what  is  called  a hyperplane  in  the  TV-dimensional  Euclidean  space  . In  general  these  M hyperplanes  have 
no  common  intersection,  and  so  we  seek  instead  some  point  in  that  is  reasonably  “close”  to  all  of  them.  Such  a point  will  constitute  an 
approximate  solution  of  the  linear  system,  and  its  N entries  will  determine  approximate  pixel  densities  with  which  to  form  the  desired  cross 
section. 


As  in  the  two-dimensional  case,  we  will  introduce  an  iterative  process  that  generates  cycles  of  successive  orthogonal  projections  onto  the  M 
hyperplanes  beginning  with  some  arbitrary  initial  point  in  . Our  notation  for  these  successive  iterates  is 


,00. 

xJc  ' 

The  algorithm  is  as  follows: 

Algorithm  2 

Step  0 Choose  any  point  in  and  label  it  xq. 
Step  1 For  the  first  cycle  of  iterates,  set  p = 1 . 
Step  2 For  k = 1,  2, M,  compute 


(the  iterate  lying  on  the  kth  hyperplane  \ 
generated  during  the  pth  cycle  of  iterations  I 


.00  00  , (h  ~ 4.*k-] ) 


**-l 


T 


■a* 


Step  3 Set  &+r>-JP). 

Step  4 Increase  the  cycle  number  p by  1 and  return  to  Step  2. 

In  Step  2 the  iterate  xjW  is  called  the  orthogonal  projection  of  onto  the  hyperplane  a£x  = Consequently,  as  in  the  two-dimensional 

case,  this  algorithm  determines  a sequence  of  orthogonal  projections  from  one  hyperplane  onto  the  next  in  which  we  cycle  back  to  the  first 
hyperplane  after  each  projection  onto  the  last  hyperplane. 

It  can  be  shown  that  if  the  vectors  , a2, . . a m span  then  the  iterates  x'  ,^ , x® , x®  lying  on  the  Mth  hyperplane  will  converge  to  a 

point  x on  that  hyperplane  which  does  not  depend  on  the  choice  of  the  initial  point  xq.  In  computed  tomography,  one  of  the  iterates  x^  for  p 
sufficiently  large  is  taken  as  an  approximate  solution  of  the  linear  system  for  the  pixel  densities. 


Note  that  for  the  center-of-pixel  method,  the  scalar  quantity  ajj.  appearing  in  the  equation  in  Step  2 of  the  algorithm  is  simply  the  number  of 
pixels  in  which  the  kth  beam  passes  through  the  center.  Similarly,  note  that  the  scalar  quantity 

, T (P) 

1 

in  that  same  equation  can  be  interpreted  as  the  excess  kth  beam  density  that  results  if  the  pixel  densities  are  set  equal  to  the  entries  of  • This 
provides  the  following  interpretation  of  our  ART  iteration  scheme  for  the  center-of-pixel  method:  Generate  the  pixel  densities  of  each  iterate  by 
distributing  the  excess  beam  density  of  successive  beams  in  the  scan  evenly  among  those  pixels  in  which  the  beam  passes  through  the  center 
When  the  last  beam  in  the  scan  has  been  reached,  return  to  the  first  beam  and  continue. 

EXAMPLE  2 Using  Algorithm  2 

We  can  use  Algorithm  2 to  find  the  unknown  pixel  densities  of  the  9 pixels  arranged  in  the  3 x 3 array  illustrated  in  Figure 
10.12.1 1.  These  9 pixels  are  scanned  using  the  parallel  mode  with  12  beams  whose  measured  beam  densities  are  indicated  in  the 
figure.  We  choose  the  center-of-pixel  method  to  set  up  the  12  beam  equations.  (In  Exercises  7 and  8,  you  are  asked  to  set  up  the 
beam  equations  using  the  center  line  and  area  methods.)  As  you  can  verify,  the  beam  equations  are 


*7 

4= 

*8 

4= 

x9 

= 13.00 

*3 

4= 

x6 

+ 

*9 

= 18.00 

x4 

4= 

4= 

x6 

= 15.00 

x2 

4= 

x5 

4= 

*8 

= 12.00 

xi 

4= 

*2 

4= 

*3 

= 8.00 

x\ 

4= 

x4 

+ 

X1 

= 6.00 

x6 

4= 

*8 

4= 

*9 

= 14.79 

x2 

+ 

x3 

4= 

x6 

= 10.51 

x3 

4= 

x 5 

4= 

x7 

= 14.31 

x\ 

4= 

x5 

4= 

x9 

= 16.13 

*1 

4= 

x2 

4= 

x4 

= 3.81 

x4 

+ 

X1 

4= 

*8 

= 7.04 

Table  2 illustrates  the  results  of  the  iteration  scheme  starting  with  an  initial  xq  = 0.  The  table  gives  the  values  of  each  of  the  first 

cycle  of  iterates,  x- 1 1 through  x® , but  thereafter  gives  the  iterates  -f  ff  only  for  various  values  of  p.  The  iterates  x^  start 

1 1a  la  la 

repeating  to  two  decimal  places  for  p > 45,  and  so  we  take  the  entries  of  x^  as  approximate  values  of  the  9 pixel  densities. 


Figure  10.12.11 


Table  2 


Pixel  Densities 

X2 

*3 

*4 

*6 

X7 

*8 

X9 

xo 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.(1) 

Al 

.00 

.00 

.00 

.00 

.00 

.00 

4.33 

4.33 

4.33 

X<1> 

.00 

.00 

.00 

5.00 

5.00 

5.00 

4.33 

4.33 

4.33 

-.0) 

AJ 

2.67 

2.67 

2.67 

5.00 

5.00 

5.00 

4.33 

4.33 

4.33 

C 

2.67 

2.67 

2.67 

5.00 

5.00 

5.37 

4.33 

4.71 

4.71 

<> 

2.67 

2.67 

3.44 

5.00 

5.77 

5.37 

5.10 

4.71 

4.71 

4° 

.49 

.49 

3.44 

2.83 

5.77 

5.37 

5.10 

4.71 

4.71 

*<•> 

.49 

.49 

4.93 

2.83 

5.77 

6.87 

5.10 

4.71 

6.20 

v(l) 

A8 

.49 

.84 

4.93 

2.83 

6.11 

6.87 

5.10 

5.05 

6.20 

~(l) 

-.31 

.84 

4.93 

2.02 

6.11 

6.87 

4.30 

5.05 

6.20 

V(D 

AI0 

-.31 

.13 

4.22 

2.02 

6.11 

6.16 

4.30 

5.05 

6.20 

wU) 

All 

1.06 

.13 

4.22 

2.02 

7.49 

6.16 

4.30 

5.05 

7.58 

X<‘> 

aI2 

1.06 

.13 

4.22 

.58 

7.49 

6.16 

2.85 

3.61 

7.58 

v(2) 

A12 

2.03 

.69 

4.42 

1.34 

7.49 

5.39 

2.65 

3.04 

6.61 

v(3) 

aI2 

1.78 

.51 

4.52 

1.26 

7.49 

5.48 

2.56 

3.22 

6.86 

v(4) 

*12 

1.82 

.52 

4.62 

1.37 

7.49 

5.37 

2.45 

^ I'y 

6.82 

«(5) 

*12 

1.79 

.49 

4.71 

1.43 

7.49 

5.31 

2.37 

3.25 

6.85 

V(I0> 

aI2 

1.68 

.44 

5.03 

1.70 

7.49 

5.03 

2.04 

3.29 

6.96 

vt20) 
A, 2 

1.49 

.48 

5.29 

2.00 

7.49 

4.73 

1.79 

3.25 

7.15 

v<») 

X12 

1.38 

.55 

5.34 

2.11 

7.49 

4.62 

1.74 

3.19 

7.26 

„(40) 

1.33 

.59 

5.33 

2.14 

7.49 

4.59 

1.75 

3.15 

7.31 

v«45> 

A12 

1.32 

.60 

5.32 

2.15 

7.49 

4.59 

1.76 

3.14 

7.32 

We  close  this  section  by  noting  that  the  field  of  computed  tomography  is  presently  a very  active  research  area.  In  fact,  the  ART  scheme 
discussed  here  has  been  replaced  in  commercial  systems  by  more  sophisticated  techniques  that  are  faster  and  provide  a more  accurate  view  of 
the  cross  section.  However,  all  the  new  techniques  address  the  same  basic  mathematical  problem:  finding  a good  approximate  solution  of  a 
large  overdetermined  inconsistent  linear  system  of  equations. 


Exercise  Set  10.12 


i. 


(a)  Setting  _ (x^y  x<^)’  s^ow  that  the  three  projection  equations 


_ + (bk  ~ 4*^-1 ) 

T a*’ 

a*a* 


k=  1,2,3 


for  the  three  lines  in  Equation  5 can  be  written  as 


»_  1 


» Cp), 


k=  1: 


* = 2: 


* = 3: 


Ml  “ "2^ + *01  ” *02  J 
xg)-l[-2  + 4r«  + 2.«] 


» _ 1 


»! 


*22  =t[4  + 2xll  +*i2  ] 


*• 

» 


(P) 


(P) 


31  ~ -j-Q-  [9  -4*  x2i 


+ 3*SJ] 


*32  ’ ~ 'jq' t “ 3 + 3x21^ 


where  (xS?+1).  *02+1))  = (*?\-  *22)  r°r-P  = 1’2 

(b)  Show  that  the  three  pairs  of  equations  in  part  (a)  can  be  combined  to  produce 


» 1 


*3!  — 20’[2^  + x31 


(p-0  _(p— Oi 


*32 


» 1 


(P~ 0 » 0-0, 


/>  = 1,  2, . 


*32  =2^[24  + 3^  -3x32  V] 

. [Note:  Using  this  pair  of  equations,  we  can  perform  one  complete  cycle  of  three  orthogonal 

projections  in  a single  step.] 

(c)  Because  x':f  tends  to  the  limit  point  as  p —>  oo?  the  equations  in  part  (b)  become 

4 =^[28+4  -42] 

*32  = 2q”  1 24  + 3*31  - 3x32  ] 

as  P — 1 ► oo.  Solve  this  linear  system  for  x^  = , X32  ) • [Note:  The  simplifications  of  the  ART  formulas  described  in  this  exercise  are 

impractical  for  the  large  linear  systems  that  arise  in  realistic  computed  tomography  problems.] 


Answer: 


2.  Use  the  result  of  Exercise  1(b)  to  find  x^  x®  x'y"  to  five  decimal  places  in  Example  1 using  the  following  initial  points: 

(a)  *0  = (0.  0) 

(b)  *0  = O>  1) 

(c)  xo  = (148.  -15) 


Answer: 


(a)  xf  = (1.40000,  1.20000) 
x®  = (1.41000,  1.23000) 
xf  = (1.40900,  1.22700) 
x®  = (1  40910,  1.22730) 
x®  = (1.40909,  1.22727) 

xf  = (1.40909, 1.22727) 

(b)  Same  as  part  (a) 


(°)  xf  = (9  55000,  25  65000) 
x®  = (.59500,  - 1.21500) 
xf  = (1.49050,  1.47150) 
xf  = (1.40095,  1.20285) 
xf*  = (1.40991,  1.22972) 
xf  = (1.40901,  1.22703) 


3. 


(a)  Show  directly  that  the  points  of  the  limit  cycle  in  Example  1, 


v*  _ (H  11)  * - 78)  * _ /31_  27  ) 

1 Ul’ll)  2 p5’  55 ) 3 ^22 ’ 22 J 

form  a triangle  whose  vertices  lie  on  the  lines  L\,  £ 2,  and  £3  and  whose  sides  are  perpendicular  to  these  lines  (Figure  10.12.9c). 

/t_x  (1)  * i3\  21  \ 

Using  the  equations  derived  in  Exercise  1(a),  show  that  if  Xq  = X3  = f— , 7^7  |,  then 

*02  10) 

X1  ~X1  - (ll  ’ ll  ) 

J?)_  *_M6  78) 

x2  ~ x2  “ (55  ’ 55  ) 

,©_•  /3!  27) 

x3  — x3  — ^22  ’ 22  J 

[Note:  Either  part  of  this  exercise  shows  that  successive  orthogonal  projections  of  any  point  on  the  limit  cycle  will  move  around  the 
limit  cycle  indefinitely.] 


4.  The  following  three  lines  in  the  ^1^2-plane, 


L\\  *2=1 

Lx  *1  -*2  = 2 
£3:  *1  — *2  = 0 


do  not  have  a common  intersection.  Draw  an  accurate  sketch  of  the  three  lines  and  graphically  perform  several  cycles  of  the  orthogonal 
projections  described  in  Algorithm  1,  beginning  with  the  initial  point  xq  = (0,  0).  On  the  basis  of  your  sketch,  determine  the  three  points  of 
the  limit  cycle. 


Answer: 

X1  = (1.  1)’  x2  = (2>  °)’x3  = 0-  0 

5.  Prove  Theorem  10.12.1  by  verifying  that 

(a)  the  point  xp  as  defined  in  the  theorem  lies  on  the  line  aTx  _ ^ (i.e.,  = £)• 

(b)  the  vector  xv  — x + is  orthogonal  to  the  line  aTx  _ ^ (i.e.,  xv  — x+  is  parallel  to  a). 

6.  As  stated  in  the  text,  the  iterates  x®  ? x®  defined  in  Algorithm  2 will  converge  to  a unique  limit  point  if  the  vectors 

aj , a2, . . S-M  sPan  • Show  that  if  this  is  the  case  and  if  the  center-of-pixel  method  is  used,  then  the  center  of  each  of  the  N pixels  in  the 
field  of  view  is  crossed  by  at  least  one  of  the  M beams  in  the  scan. 

7.  Construct  the  12  beam  equations  in  Example  2 using  the  center  line  method.  Assume  that  the  distance  between  the  center  lines  of  adjacent 
beams  is  equal  to  the  width  of  a single  pixel. 


Answer: 


x7  + xg  + X9  = 13.00 
X4  + X5  + ^6  = 15.00 
xj  4=x2  4-x3  = 8.00 
.82843(x6  + xg)  + .58579^9  = 14.79 
1.41421  (x3  + x5  + x7)  = 14.31 
.82843(x2  4=  x4)  4=  .58579xi  = 3.81 
x3  + x$  +X9  = 18.00 
x2  + X5  + xg  = 12.00 
xi  4=x4  + x7  = 6.00 
. 82843  (x 2 + x6)  + .58579x3  = 10.51 
1.41421(xi+x54-x9)  = 16.13 
82843(x4  + xg)  + .58579x7  = 7.04 

8.  Construct  the  12  beam  equations  in  Example  2 using  the  area  method.  Assume  that  the  width  of  each  beam  is  equal  to  the  width  of  a single 
pixel  and  that  the  distance  between  the  center  lines  of  adjacent  beams  is  also  equal  to  the  width  of  a single  pixel. 

Answer: 

x7  + xg  + X9  = 13.00 
X44-X5  + xg  = 15.00 
xi  +x2  4^x3  = 8.00 
04289(x3  4-  x5  + x7)  + .75000(x6  4-  xg)  + ,61396x9  = 14.79 
.91421  (x3  + X5  + x7)  4=  .25000(x2  4-X4  + X6  + xg)  = 14.31 
. 04289 (x 3 + X5  + x7)  + .75000 (x2  4=x4)  4=  ,61396xi  = 3.81 

x3  + xg  + X9  = 18.00 
x2  + X5  + xg  = 12.00 
xi  4=X4  + x7  = 6.00 

04289(xi  +X5  + X9)  4=  .75000(x2  + xg)  4=  .61396x3  = 10.51 
.91421  (xi  +X5  + X9)  4=  .25000(x2  + X4  + X6  4=xg)  = 16.13 
,04289(xi  +X5  4^x9)  + .75000(x44=xg)  4=  .61396x7  = 7.04 

Technology  Exercises 


The  following  exercises  are  designed  to  be  solved  using  a technology  utility.  Typically,  this  will  be  MATLAB,  Mathematical  Maple,  Derive,  or 
Mathcad,  but  it  may  also  be  some  other  type  of  linear  algebra  software  or  a scientific  calculator  with  some  linear  algebra  capabilities.  For  each 
exercise  you  will  need  to  read  the  relevant  documentation  for  the  particular  utility  you  are  using.  The  goal  of  these  exercises  is  to  provide  you 
with  a basic  proficiency  with  your  technology  utility.  Once  you  have  mastered  the  techniques  in  these  exercises,  you  will  be  able  to  use  your 
technology  utility  to  solve  many  of  the  problems  in  the  regular  exercise  sets. 


Tl.  Given  the  set  of  equations 

akx+b  ky  = Ck 

for  k = 1,  2,  3, n (with  ^ > 2)>  let us  consider  the  following  algorithm  for  obtaining  an  approximate  solution  to  the  system. 
1 . Solve  all  possible  pairs  of  equations 

fljX  4 -bjy  = Cj  and  ajx  + bjy  = cj 

for  i,  j = 1,  2,  3, n and  i < j for  their  unique  solutions.  This  leads  to 

i»(»- 1) 

solutions,  which  we  label  as 


for  i,  j = 1 , 2,  3, . . n and  i < j. 

2.  Construct  the  geometric  center  of  these  points  defined  by 


(*c.yc)  = 


. M-l  M 

, - n-E  E , n E E yi} 

»(«-!)  i=l>=i+l  3 »(»-!)  i=l;a+l  3 


n — 1 n 


and  use  this  as  the  approximate  solution  to  the  original  system. 


Use  this  algorithm  to  approximate  the  solution  to  the  system 

x + y = 2 

x -2 y=  - 2 

3 x-  y = 3 

and  compare  your  results  to  those  in  this  section. 

T2.  (Calculus  required)  Given  the  set  of  equations 

<*kX+biO’=ck 

for  k = 1,  2,  3, n (with  n > 2),  let  us  consider  the  following  least  squares  algorithm  for  obtaining  an  approximate  solution  ( x ,y  ) to  the 
system.  Given  a point  (c*,  ff)  and  the  line  a^x  -| -b^y  =Cp  the  distance  from  this  point  to  the  line  is  given  by 


If  we  define  a function  f(x,y)  by 


f(x,y)  = t tW  + bV-'i) 
1=1 


2 . ,2 

ai  +bi 

and  then  determine  the  point  (*  , y ) that  minimizes  this  function,  we  will  determine  the  point  that  is  closest  to  each  of  these  lines  in  a 
summed  least  squares  sense.  Show  that  x and  y are  solutions  to  the  system 

n „ 


and 


Apply  this  algorithm  to  the  system 


£ 

=i 


f' aibi 


V' aibi 


+ (E— ^ 


= E 


=i  «?+*? 


x+  y = 2 

x - 2 y=  - 2 

3x  - y = 3 


i=laf  + bf 


= i 


i=laf  + bj 


and  compare  your  results  to  those  in  this  section. 
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10.13  Fractals 

In  this  section  we  will  use  certain  classes  of  linear  transformations  to  describe  and  generate  intricate  sets  in  the  Euclidean  plane.  These  sets,  called  fractals,  are 
currently  the  focus  of  much  mathematical  and  scientific  research. 


Prerequisites 

Geometry  of  Linear  Operators  on  (Section  4.11) 

Euclidean  Space  Rn 

Natural  Logarithms 

Intuitive  Understanding  of  Limits 


Fractals  in  the  Euclidean  Plane 

At  the  end  of  the  nineteenth  century  and  the  beginning  of  the  twentieth  century,  various  bizarre  and  wild  sets  of  points  in  the  Euclidean  plane  began  appearing  in 
mathematics.  Although  they  were  initially  mathematical  curiosities,  these  sets,  called  fractals,  are  rapidly  growing  in  importance.  It  is  now  recognized  that  they  reveal 
a regularity  in  physical  and  biological  phenomena  previously  dismissed  as  “random,”  “noisy,”  or  “chaotic.”  For  example,  fractals  are  all  around  us  in  the  shapes  of 
clouds,  mountains,  coastlines,  trees,  and  ferns. 

In  this  section  we  give  a brief  description  of  certain  types  of  fractals  in  the  Euclidean  plane  Much  of  this  description  is  an  outgrowth  of  the  work  of  two 
mathematicians,  Benoit  B.  Mandelbrot  and  Michael  Barnsley,  who  are  both  active  researchers  in  the  field. 


Self-Similar  Sets 


To  begin  our  study  of  fractals,  we  need  to  introduce  some  terminology  about  sets  in  g}.  We  will  call  a set  in  p}  bounded  if  it  can  be  enclosed  by  a suitably  large 
circle  (Figure  10.13.1)  and  closed  if  it  contains  all  of  its  boundary  points  (Figure  10.13.2).  Two  sets  in  r}  will  be  called  congruent  if  they  can  be  made  to  coincide 
exactly  by  translating  and  rotating  them  appropriately  within  r}  (Figure  10.13.3).  We  will  also  rely  on  your  intuitive  concept  of  overlapping  and  nonoverlapping 
sets,  as  illustrated  in  Figure  10.13.4. 


y Enclosing 

circle 


Unbounded  set 


(a)  Set  enclosed  by  a circle  ( b ) This  set  cannot  be 

enclosed  by  any  circle. 


Figure  10.13.1 


y 


Closed  set 


x 


The  boundary  points  (solid  color)  lie  in  the  set. 


Congruent  sets 


Figure  10.13.3 


x 

(a ) Overlapping  sets 

a? 


( b ) Nonoverlapping  sets 

Figure  10.13.4 

If  T.B?  — » F?  is  the  linear  operator  that  scales  by  a factor  of  s (see  Table  7 of  Section  4.9),  and  if  Q is  a set  in  g},  then  the  set  T(Q)  (the  set  of  images  of  points  in  Q 
under  T)  is  called  a dilation  of  the  set  Q if  s > 1 and  a contraction  of  Q if  0 < s < 1 (Figure  10.13.5).  In  either  case  we  say  that  T(Q)  is  the  set  Q scaled  by  the  factor 
s. 


The  types  of  fractals  we  will  consider  first  are  called  self-similar.  In  general,  we  define  a self-similar  set  in  g 2 as  follows: 

r n 


DEFINITION  1 

A closed  and  bounded  subset  of  the  Euclidean  plane  g^  is  said  to  be  self-similar  if  it  can  be  expressed  in  the  form 

S = S\  U&2  U^3  U...U£ft  (1) 

where  S\r  S2,  £3, Sfr  are  nonoverlapping  sets,  each  of  which  is  congruent  to  S scaled  by  the  same  factor  s (0  < s < 1). 

L J 


If  S is  a self-similar  set,  then  1 is  sometimes  called  a decomposition  of  S into  nonoverlapping  congruent  sets. 

EXAMPLE  1 Line  Segment 


Aline  segment  in  g 2 (Figure  10.13.6a)  can  be  expressed  as  the  union  of  two  nonoverlapping  congruent  line  segments  (Figure  10.13.66).  In  Figure 
10.13.66  we  have  separated  the  two  line  segments  slightly  so  that  they  can  be  seen  more  easily.  Each  of  these  two  smaller  line  segments  is  congruent  to 


the  original  line  segment  scaled  by  a factor  of  A Hence,  a line  segment  is  a self-similar  set  with  k = 2 and  s = 


(*) 


{'>) 

Figure  10.13.6 


EXAMPLE  2 Square 


A square  (Figure  10.13.7a)  can  be  expressed  as  the  union  of  four  nonoverlapping  congruent  squares  (Figure  10.13.7Z>),  where  we  have  again  separated 
the  smaller  squares  slightly.  Each  of  the  four  smaller  squares  is  congruent  to  the  original  square  scaled  by  a factor  of  i Hence,  a square  is  a self-similar 

i 2 

set  with  k = A and  s = . 


(<0 


ib) 

Figure  10.13.7 


EXAMPLE  3 Sierpinski  Carpet 


The  set  suggested  by  Figure  10.13.8a,  the  Sierpinski  “carpet,”  was  first  described  by  the  Polish  mathematician  Waclaw  Sierpinski  (1882-1969).  It  can 
be  expressed  as  the  union  of  eight  nonoverlapping  congruent  subsets  (Figure  10.13.86),  each  of  which  is  congruent  to  the  original  set  scaled  by  a factor 
of  i.  Hence,  it  is  a self-similar  set  with  k = S and  s = y . Note  that  the  intricate  square-within-a-square  pattern  continues  forever  on  a smaller  and 
smaller  scale  (although  this  can  only  be  suggested  in  a figure  such  as  the  one  shown). 


(<*) 


Figure  10.13.8 


(*> 


EXAMPLE  4 Sierpinski  Triangle 

Figure  10.13.9a  illustrates  another  set  described  by  Sierpinski.  It  is  a self-similar  set  with  k = 3 and  s = (Figure  10.13.96).  As  with  the  Sierpinski 
carpet,  the  intricate  triangle-within-a-triangle  pattern  continues  forever  on  a smaller  and  smaller  scale. 


(«) 


(*> 


Figure  10.13.9 


The  Sierpinski  carpet  and  triangle  have  a more  intricate  structure  than  the  line  segment  and  the  square  in  that  they  exhibit  a pattern  that  is  repeated  indefinitely.  This 
difference  will  be  explored  later  in  this  section. 


Topological  Dimension  of  a Set 

In  Section  4.5  we  defined  the  dimension  of  a subspace  of  a vector  space  to  be  the  number  of  vectors  in  a basis,  and  we  found  that  definition  to  coincide  with  our 
intuitive  sense  of  dimension.  For  example,  the  origin  of  p 2 is  zero-dimensional,  lines  through  the  origin  are  one-dimensional,  and  p}  itself  is  two-dimensional.  This 
definition  of  dimension  is  a special  case  of  a more  general  concept  called  topological  dimension,  which  is  applicable  to  sets  in  Rn  that  are  not  necessarily  subspaces. 
A precise  definition  of  this  concept  is  studied  in  a branch  of  mathematics  called  topology.  Although  that  definition  is  beyond  the  scope  of  this  text,  we  can  state 
informally  that 

a point  in  p}  has  topological  dimension  zero; 
a curve  in  p^  has  topological  dimension  one; 
a region  in  has  topological  dimension  two. 

It  can  be  proved  that  the  topological  dimension  of  a set  in  Pn  must  be  an  integer  between  0 and  n,  inclusive.  In  this  text  we  will  denote  the  topological  dimension  of  a 
set  -S'  by  of  T'(^) . 

EXAMPLE  5 Topological  Dimensions  of  Sets 

Table  1 gives  the  topological  dimensions  of  the  sets  studied  in  our  earlier  examples.  The  first  two  results  in  this  table  are  intuitively  obvious;  however, 
the  last  two  are  not.  Informally  stated,  the  Sierpinski  carpet  and  triangle  both  contain  so  many  “holes”  that  those  sets  resemble  web-like  networks  of 
lines  rather  than  regions.  Hence  they  have  topological  dimension  one.  The  proofs  are  quite  difficult. 

Table  1 


SetS 

dj{S) 

Line  segment 

1 

Square 

2 

Sierpinski  carpet 

1 

Sierpinski  triangle 

1 

Hausdorff  Dimension  of  a Self-Similar  Set 

In  1919  the  German  mathematician  Felix  Hausdorff  (1868-1942)  gave  an  alternative  definition  for  the  dimension  of  an  arbitrary  set  in  Rn.  His  definition  is  quite 
complicated,  but  for  a self-similar  set,  it  reduces  to  something  rather  simple: 


DEFINITION  1 


The  Hausdorff  dimension  of  a self-similar  set  S of  form  1 is  denoted  by  d}{{S)  and  is  defined  by 


dH(S)  = 


ln£ 

ln(l  / s) 


(2) 


J 


In  this  definition,  “In”  denotes  the  natural  logarithm  function.  Equation  2 can  also  be  expressed  as 

JifS)  _ 1 


(3) 


in  which  the  Hausdorff  dimension  df{(S)  appears  as  an  exponent.  Formula  3 is  more  helpful  for  interpreting  the  concept  of  Hausdorff  dimension;  it  states,  for 

1 / 1 \djfS) 

example,  that  if  you  scale  a self-similar  set  by  a factor  of  s = — , then  its  area  (or  more  properly  its  measure)  decreases  by  a factor  of  1 2-  j . Thus,  scaling  a line 


segment  by  a factor  of  -1  reduces  its  measure  (length)  by  a factor  of 


(2)  2’ 


and  scaling  a square  region  by  a factor  of  ~ reduces  its  measure  (area)  by  a factor  of 


Before  proceeding  to  some  examples,  we  should  note  a few  facts  about  the  Hausdorff  dimension  of  a set: 

The  topological  dimension  and  Hausdorff  dimension  of  a set  need  not  be  the  same. 

The  Hausdorff  dimension  of  a set  need  not  be  an  integer. 

The  topological  dimension  of  a set  is  less  than  or  equal  to  its  Hausdorff  dimension;  that  is,  d j(ff)  S'). 

EXAMPLE  6 Hausdorff  Dimensions  of  Sets 

Table  2 lists  the  Hausdorff  dimensions  of  the  sets  studied  in  our  earlier  examples. 

Table  2 


Set  S 

In  A' 

s 

k 

In  (l/j) 

Line  segment 

2 

In  2 /In  2 = 1 

Square 

4 

In  4/ln  2 = 2 

Sierpinski  carpet 

8 

In  8/ln  3 = 1.892  . . . 

Sierpinski  triangle 

3 

In  3/ln  2-  1.584  . . . 

Fractals 

Comparing  Tables  1 and  2,  we  see  that  the  Hausdorff  and  topological  dimensions  are  equal  for  both  the  line  segment  and  square  but  are  unequal  for  the  Sierpinski 
carpet  and  triangle.  In  1977  Benoit  B.  Mandelbrot  suggested  that  sets  for  which  the  topological  and  Hausdorff  dimensions  differ  must  be  quite  complicated  (as 
Hausdorff  had  earlier  suggested  in  1919).  Mandelbrot  proposed  calling  such  sets  fractals , and  he  offered  the  following  definition. 

r 1 


DEFINITION  3 

A fractal  is  a subset  of  a Euclidean  space  whose  Hausdorff  dimension  and  topological  dimension  are  not  equal. 


According  to  thisdefinition,  the  Sierpinski  carpet  and  Sierpinski  triangle  are  fractals,  whereas  the  line  segment  and  square  are  not. 

It  follows  from  the  preceding  definition  that  a set  whose  Hausdorff  dimension  is  not  an  integer  must  be  a fractal  (why?).  However,  we  will  see  later  that  the  converse 
is  not  true;  that  is,  it  is  possible  for  a fractal  to  have  an  integer  Hausdorff  dimension. 


Similitudes 

We  will  now  show  how  some  techniques  from  linear  algebra  can  be  used  to  generate  fractals.  This  linear  algebra  approach  also  leads  to  algorithms  that  can  be 
exploited  to  draw  fractals  on  a computer.  We  begin  with  a definition. 


DEFINITION  4 


A similitude  with  scale  factor  s is  a mapping  of  g}  into  g}  of  the  form 


where  s,  0,  e,  and / are  scalars. 


cos  9 

—sin  9 

~x~ 

sin0 

cos  9 

y 

J 


Geometrically,  a similitude  is  a composition  of  three  simpler  mappings:  a scaling  by  a factor  of  s,  a rotation  about  the  origin  through  an  angle  0,  and  a translation  (e 
units  in  the  x-direction  and / units  in  the  y-direction).  Figure  10. 13. 10  illustrates  the  effect  of  a similitude  on  the  unit  square  U. 


{y 

(i,  i) 

(0.  1) 

T 

u 

1 

I 

_L 

(0.0)1  (1,0) 
(a)  Unit  square 


(b)  Unit  square 
after  similitude 


Figure  10.13.10 

For  our  application  to  fractals,  we  will  need  only  similitudes  that  are  contractions,  by  which  we  mean  that  the  scale  factor  s is  restricted  to  the  range  0 < s < 1 • 
Consequently,  when  we  refer  to  similitudes  we  will  always  mean  similitudes  subject  to  this  restriction. 

Similitudes  are  important  in  the  study  of  fractals  because  of  the  following  fact: 


Iff ; p}  — ► p}  is  a similitude  with  scale  factor  s and  if  S is  a closed  and  bounded  set  in  p^,  then  the  image  T(S)  of  the  set  S under  T is  congruent  to  S scaled 
by  s. 


Recall  from  the  definition  of  a self-similar  set  in  p*  that  a closed  and  bounded  set  S in  p 2 is  self-similar  if  it  can  be  expressed  in  the  form 

S=SiUS2uS3U...uSk 

where  S\,  S2,  are  nonoverlapping  sets  each  of  which  is  congruent  to  S scaled  by  the  same  factor  s (0  < s < 1 ) [see  1].  In  the  following  examples,  we  will 

find  similitudes  that  produce  the  sets  S\t  S2,  S2>  from  S for  the  line  segment,  square,  Sierpinski  carpet,  and  Sierpinski  triangle. 

EXAMPLE  7 Line  Segment 


We  will  take  as  our  line  segment  the  line  segment  S connecting  the  points  (0,  0)  and  (1,  0)  in  the  xy-plane  (Figure  10.13.11a).  Consider  the  two 
similitudes 


II 

"1  O' 
_0  1 

p] 

t4 

<])  = i 
qj  2 

'1  o' 
_0  1_ 

+ 

1 1 

V 

2 

0 

(4) 


both  of  which  have  s = and  Q = Q.  In  Figure  10.13.11Z?  we  show  how  these  two  similitudes  map  the  unit  square  U.  The  similitude  7 \ maps  U onto 


the  smaller  square  T\  (U),  and  the  similitude  T2  maps  U onto  the  smaller  square  T2(U).  At  the  same  time,  T\  maps  the  line  segment  S onto  the 
smaller  line  segment  T \ (£),  and  T2  maps  S onto  the  smaller  nonoverlapping  line  segment  T2(S) . The  union  of  these  two  smaller  nonoverlapping  line 
segments  is  precisely  the  original  line  segment  S;  that  is, 


£=TiOSOU  72(£) 


(5) 


y 


(0.1) 

(1,1) 

u 

(0.0) 

S (1.0) 
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Figure  10.13.11 


EXAMPLE  8 Square 


Let  us  consider  the  unit  square  U in  the  xy-plane  (Figure  10.13.12a)  and  the  following  four  similitudes,  all  having  s = and  9 = 0- 


°]H  Ti 
-4d)=i[o  m 


(p])-i[:  :h 

PH: 


The  images  of  the  unit  square  U under  these  four  similitudes  are  the  four  squares  shown  in  Figure  10.13.12Z?.  Thus, 

U = T\{U)  U T2(U)  U T3(U)  U T4(U) 

is  a decomposition  of  U into  four  nonoverlapping  squares  that  are  congruent  to  U scaled  by  the  same  scale  factor  ^ ^ j. 
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Figure  10.13.12 


EXAMPLE  9 Sierpinski  Carpet 


Let  us  consider  a Sierpinski  carpet  S over  the  unit  square  U of  the  xy-plane  (Figure  10.13.13a)  and  the  following  eight  similitudes,  all  having  s = — and 
0 = 0: 


'•■PHJ  ?][?]- 


Si 


i = 1.2,  3 8 


where  the  eight  values  of 


1 

2 

0 

3 

, 

3 

, 

1 

0 

0 

3 

The  images  of  S under  these  eight  similitudes  are  the  eight  sets  shown  in  Figure  10.13.13Z>.  Thus, 

S = T\ (£)  U 72(S)  u 73(50  U...U  78(50 

is  a decomposition  of  S into  eight  nonoverlapping  sets  that  are  congruent  to  S scaled  by  the  same  scale  factor  fs  = j j. 
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(ft) 
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Figure  10.13.13 


EXAMPLE  10  Sierpinski  Triangle 

Let  us  consider  a Sierpinski  triangle  S fitted  inside  the  unit  square  U of  the  xy-plane,  as  shown  in  Figure  10.13.14a,  and  the  following  three  similitudes, 
all  having  s = and  0 = Q: 


4H)  - i[i  !H 
4PB  - i[i  ?] 
4H)  - i[i  ?] 


The  images  of  S under  these  three  similitudes  are  the  three  sets  in  Figure  10.13.14Z>.  Thus, 

S'=7i(50u72(50u73(50 


(10) 


(11) 


is  a decomposition  of  S into  three  nonoverlapping  sets  that  are  congruent  to  S scaled  by  the  same  scale  factor 
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Figure 


1.13.14 


In  the  preceding  examples  we  started  with  a specific  set  S and  showed  that  it  was  self-similar  by  finding  similitudes  T\,  T2,  T3, . . T ^ with  the  same  scale  factor 
such  that  T\  (£),  72(5),  73(5), ...,  7fc(5)  were  nonoverlapping  sets  and  such  that 

5 = T\ (5)  U T2(X)  U 7*3(5)  U ...  U 7*(5)  (12) 

The  following  theorem  addresses  the  converse  problem  of  determining  a self-similar  set  from  a collection  of  similitudes. 


THEOREM  10.13.1 

If  T\ , 7*2,  T3, Tfr  are  contracting  similitudes  with  the  same  scale  factor,  then  there  is  a unique  nonempty  closed  and  bounded  set  S in  the  Euclidean  plane 
such  that 

5 = T\ (5)  U 7-2(5)  U 7-3(5)  U... U 7*(5) 

Furthermore,  if  the  sets  7i(5),  72(5),  73(5), ...,  7^(5)  are  nonoverlapping,  then  S is  self-similar. 


Algorithms  for  Generating  Fractals 


In  general,  there  is  no  simple  way  to  obtain  the  set  S in  the  preceding  theorem  directly.  We  now  describe  an  iterative  procedure  that  will  determine  S from  the 
similitudes  that  define  it.  We  first  give  an  example  of  the  procedure  and  then  give  an  algorithm  for  the  general  case. 

EXAMPLE  11  Sierpinski  Carpet 


Figure  10.13.15  shows  the  unit  square  region  5q  in  the  xy-plane,  which  will  serve  as  an  “initial”  set  for  an  iterative  procedure  for  the  construction  of  the 
Sierpinski  carpet.  The  set  S\  in  the  figure  is  the  result  of  mapping  £q  with  each  of  the  eight  similitudes  Tj  (i  = 1,  2, ...,  8)  in  8 that  determine  the 
Sierpinski  carpet.  It  consists  of  eight  square  regions,  each  of  side  length  i,  surrounding  an  empty  middle  square.  Next  we  apply  the  eight  similitudes  to 


and  arrive  at  the  set  S2.  Similarly,  applying  the  eight  similitudes  to  S2  results  in  the  set  £3.  It  we  continue  this  process  indefinitely,  the  sequence  of 
sets  , S2,  £3, . . . will  “converge”  to  a set  S,  which  is  the  Sierpinski  carpet. 
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Figure  10.13.15 


Although  we  should  properly  give  a definition  of  what  it  means  for  a sequence  of  sets  to  “converge”  to  a given  set,  an  intuitive  interpretation  will  suffice  in 
this  introductory  treatment. 


Although  we  started  in  Figure  10.13.15  with  the  unit  square  region  to  arrive  at  the  Sierpinski  carpet,  we  could  have  started  with  any  nonempty  set  Sq.  The  only 
restriction  is  that  the  set  Sq  be  closed  and  bounded.  For  example,  if  we  start  with  the  particular  set  Sq  shown  in  Figure  10. 13.16,  then  Sj  is  the  set  obtained  by 
applying  each  of  the  eight  similitudes  in  8.  Applying  the  eight  similitudes  to  results  in  the  set  S 2 • As  before,  applying  the  eight  similitudes  indefinitely  yields  the 
Sierpinski  carpet  S as  the  limiting  set. 
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Figure  10.13.16 

The  general  algorithm  illustrated  in  the  preceding  example  is  as  follows:  Let  T\,  7*2,  7*3, 7\  be  contracting  similitudes  with  the  same  scale  factor,  and  for  an 
arbitrary  set  Q in  g},  define  the  set  *7(0  by 

J(Q)  = 7i(G)  u 7*2(0  U 7*3(0  U...U  7ft(0 

The  following  algorithm  generates  a sequence  of  sets  Sq,  S\, Sn, . . . that  converges  to  the  set  S in  Theorem  10.13.1. 

Algorithm  1 

Step  0 Choose  an  arbitrary  nonempty  closed  and  bounded  set  Sq  in  p}. 

Step  1 Computes'!  =J(S 0). 

Step  2 Compute  S'2  = ) • 

Step  3 Compute  S3  = *7 (S'2) . 

Step  n Compute  Sn  = J (S'M_i ) . 

EXAMPLE  12  Sierpinski  Triangle 

Let  us  construct  the  Sierpinski  triangle  determined  by  the  three  similitudes  given  in  10.  The  corresponding  set  mapping  is 

*7(0  = 7*i  (0  U 7*2(0  U 7*3(0.  Figure  10.13.17  shows  an  arbitrary  closed  and  bounded  set  Sq;  the  first  four  iterates  S'2,  S3,  S4;  and  the  limiting 
set  S (the  Sierpinski  triangle). 


S4 

Figure  10.13.17 
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EXAMPLE  13 


Using  Algorithm  1 


◄ 


Consider  the  following  two  similitudes: 


The  actions  of  these  two  similitudes  on  the  unit  square  U are  illustrated  in  Figure  10.13.18.  Here,  the  rotation  angle  0 is  a parameter  that  we  will  vary  to 
generate  different  self-similar  sets.  The  self-similar  sets  determined  by  these  two  similitudes  are  shown  in  Figure  10.13.19  for  various  values  of  0.  For 
simplicity,  we  have  not  drawn  the  xy-axes,  but  in  each  case  the  origin  is  the  lower  left  point  of  the  set.  These  sets  were  generated  on  a computer  using 
Algorithm  1 for  the  various  values  of  0.  Because  k = 2 and  s = -i,  it  follows  from  2 that  the  Hausdorff  dimension  of  these  sets  for  any  value  of  0 is  1.  It 

can  be  shown  that  the  topological  dimension  of  these  sets  is  1 for  9 = Q and  0 for  all  other  values  of  9.  It  follows  that  the  self-similar  set  for  9 = Q is  not 
a fractal  [it  is  the  straight  line  segment  from  (0,  0)  to  (.6,  .6)],  while  the  self-similar  sets  for  all  other  values  of  9 are  fractals.  In  particular,  they  are 
examples  of  fractals  with  integer  Hausdorff  dimension. 
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Figure  10.13.19 


A Monte  Carlo  Approach 


The  set-mapping  approach  of  constructing  self-similar  sets  described  in  Algorithm  1 is  rather  time-consuming  on  a computer  because  the  similitudes  involved  must  be 
applied  to  each  of  the  many  computer  screen  pixels  in  the  successive  iterated  sets.  In  1985  Michael  Barnsley  described  an  alternative,  more  practical  method  of 
generating  a self-similar  set  defined  through  its  similitudes.  It  is  a so-called  Monte  Carlo  method  that  takes  advantage  of  probability  theory.  Barnsley  refers  to  it  as 
the  Random  Iteration  Algorithm. 


Let  T\,  7*2,  T'l, Tk  be  contracting  similitudes  with  the  same  scale  factor.  The  following  algorithm  generates  a sequence  of  points 


[21 


that  collectively  converge  to  the  set  S in  Theorem  10.13.1. 

Algorithm  2 

*o" 


Step  0 Choose  an  arbitrary  point 


y 0 


in  S. 


Step  1 Choose  one  of  the  k similitudes  at  random,  say  Tjq,  and  compute 


Step  2 Choose  one  of  the  k similitudes  at  random,  say  7\ 2,  and  compute 


Step  n Choose  one  of  the  k similitudes  at  random,  say  T^n,  and  compute 

yn 


*Tk 


"U/m-iJJ 


On  a computer  screen  the  pixels  corresponding  to  the  points  generated  by  this  algorithm  will  fill  out  the  pixel  representation  of  the  limiting  set  S. 

Figure  10.13.20  shows  four  stages  of  the  Random  Iteration  Algorithm  that  generate  the  Sierpinski  carpet,  starting  with  the  initial  point  . 

Although  Step  0 in  the  preceding  algorithm  requires  the  selection  of  an  initial  point  in  the  set  S,  which  may  not  be  known  in  advance,  this  is  not  a serious 
problem.  In  practice,  one  can  usually  start  with  any  point  in  r}  and  after  a few  iterations  (say  ten  or  so),  the  point  generated  will  be  sufficiently  close  to  S that  the 
algorithm  will  work  correctly  from  that  point  on. 
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Figure  10.13.20 


1 00,000  iterations 


More  General  Fractals 

So  far,  we  have  discussed  fractals  that  are  self-similar  sets  according  to  the  definition  of  a self-similar  set  in  ft} . However,  Theorem  10.13.1  remains  true  if  the 
similitudes  T\ , 7*2,  - - Tk  are  replaced  by  more  general  transformations,  called  contracting  affine  transformations.  An  affine  transformation  is  defined  as  follows: 

r n 


DEFINITION  5 


An  affine  transformation  is  a mapping  of  pf  into  p}  of  the  form 


where  a,  b,  c,  d,  e,  and / are  scalars. 


Figure  10.13.21  shows  how  an  affine  transformation  maps  the  unit  square  U onto  a parallelogram  T(U).  An  affine  transformation  is  said  to  be  contracting  if  the 
Euclidean  distance  between  any  two  points  in  the  plane  is  strictly  decreased  after  the  two  points  are  mapped  by  the  transformation.  It  can  be  shown  that  any  k 
contracting  affine  transformations  T\,  7*2,  7\  determine  a unique  closed  and  bounded  set  S satisfying  the  equation 


S = 7 i(S)  U 72(50  U 73(50  U... U 7*(S) 


(13) 


Equation  13  has  the  same  form  as  Equation  12,  which  we  used  to  find  self-similar  sets.  Although  Equation  13,  which  uses  contracting  affine  transformations,  does  not 
determine  a self-similar  set  S,  the  set  it  does  determine  has  many  of  the  features  of  self-similar  sets.  For  example,  Figure  10.13.22  shows  how  a set  in  the  plane 
resembling  a fern  (an  example  made  famous  by  Barnsley)  can  be  generated  through  four  contracting  affine  transformations.  Note  that  the  middle  fern  is  the  slightly 
overlapping  union  of  the  four  smaller  affine-image  ferns  surrounding  it.  Note  also  how  73,  because  the  determinant  of  its  matrix  part  is  zero,  maps  the  entire  fern  onto 
the  small  straight  line  segment  between  the  points  (.50,  0)  and  (.50,  .16).  Figure  10.13.22  contains  a wealth  of  information  and  should  be  studied  carefully. 
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Michael  Barnsley  has  applied  the  above  theory  to  the  field  of  data  compression  and  transmission.  The  fern,  for  example,  is  completely  determined  by  the  four  affine 
transformations  T\,  73,  73,  74.  These  four  transformations,  in  turn,  are  determined  by  the  24  numbers  given  in  Figure  10.13.22  defining  their  corresponding  values 


of  a,  b,  c,  d,  e,  and  f In  other  words,  these  24  numbers  completely  encode  the  picture  of  the  fern.  Storing  these  24  numbers  in  a computer  requires  considerably  less 
memory  space  than  storing  a pixel-by-pixel  description  of  the  fern.  In  principle,  any  picture  represented  by  a pixel  map  on  a computer  screen  can  be  described 
through  a finite  number  of  affine  transformations,  although  it  is  not  easy  to  determine  which  transformations  to  use.  Nevertheless,  once  encoded,  the  affine 
transformations  generally  require  several  orders  of  magnitude  less  computer  memory  than  a pixel-by-pixel  description  of  the  pixel  map. 


Further  Readings 

Readers  interested  in  learning  more  about  fractals  are  referred  to  the  following  books,  the  first  of  which  elaborates  on  the  linear  transformation  approach  of 
this  section. 

1.  Michael  Barnsley,  Fractals  Everywhere  (New  York:  Academic  Press,  1993). 

2.  Benoit  B.  Mandelbrot,  The  Fractal  Geometry  of  Nature  (New  York:  W.  H.  Freeman,  1982). 

3.  Heinz-Otto  Peitgen  and  P.  H.  Richter,  The  Beauty  of  Fractals  (New  York:  Springer- Verlag,  1986). 

4.  Heinz-Otto  Peitgen  and  Dietmar  Saupe,  The  Science  of  Fractal  Images  (New  York:  Springer- Verlag,  1988). 


Exercise  Set  10.13 

1.  The  self-similar  set  in  Figure  Ex-1  has  the  sizes  indicated.  Given  that  its  lower  left  comer  is  situated  at  the  origin  of  the  xy-plane,  find  the  similitudes  that 
determine  the  set.  What  is  its  Hausdorff  dimension?  Is  it  a fractal? 
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Figure  Ex-1 


Answer: 

;1iH(5,)=ln(4)/ln(||)=  1.888... 

2.  Find  the  Hausdorff  dimension  of  the  self-similar  set  shown  in  Figure  Ex-2.  Use  a mler  to  measure  the  figure  and  determine  an  approximate  value  of  the  scale  factor 
s.  What  are  the  rotation  angles  of  the  similitudes  determining  this  set? 
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Figure  Ex-2 


Answer: 

s « .47;  ^Fr(S)  ~ ln(4)  / ln(l  / .47)  = 1.8  ... . Rotation  angles:  0°  (upper  left);  —90°  (upper  right);  180°  (lower  left);  180°  (lower  right) 

Each  of  the  12  self-similar  sets  in  Figure  Ex-3  results  from  three  similitudes  with  scale  factor  of  -i,  and  so  all  have  Hausdorff  dimension  In  3 / In  2 = 1.584...-  The 

rotation  angles  of  the  three  similitudes  are  all  multiples  of  9Q°-  Find  these  rotation  angles  for  each  set  and  express  them  as  a triplet  of  integers  (n\,  «3),  where 

is  the  corresponding  integer  multiple  of  90°  in  the  order  upper  right,  lower  left,  lower  right.  For  example,  the  first  set  (the  Sierpinski  triangle)  generates  the 
triplet  (0,  0,  0). 


Answer: 

(0,  0,  0),  (1,  0,  0),  (2,  0,  0),  (3,  0,  0),  (0,  0,  1),  (0,  0,  2),  (1,  2,  0),  (2,  1,  3),  (2,  0,  1),  (2,  0,  2),  (2,  2,  0),  (0,  3,  3) 
4.  For  each  of  the  self-similar  sets  in  Figure  Ex-4,  find: 

(i)  the  scale  factor  s of  the  similitudes  describing  the  set; 

(ii)  the  rotation  angles  0 of  all  similitudes  describing  the  set  (all  rotation  angles  are  multiples  of  90°);  and 

(iii)  the  Hausdorff  dimension  of  the  set. 

Which  of  the  sets  are  fractals  and  why? 
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Figure  Ex-4 


(d) 


Answer: 

(a)  (i)  s = -i ; (ii)  all  rotation  angles  are  0°;  (iii)  d}j(S)  = M7)  / M3)  = 1.771  . ...  This  set  is  a fractal. 

(b)  (i)  s — i;  (ii)  all  rotation  angles  are  180°;  (iii)  d}j(S)  = M3)  / M2)  = 1-584  ....  This  set  is  a fractal. 

(c)  (i)  s — i;  (ii)  rotation  angles:  ^90°  (top);  180°  (lower  left);  180°  (lower  right);  (iii)  d}j(S)  = ln(3)  / ln(2)  = 1.584  . ...  This  set  is  a fractal. 

(d)  (i)  s = 1;  (ii)  rotation  angles:  90°  (upper  left);  180°  (upper  right);  180°  (lower  right);  (iii)  dfj(S)  = ln(3)  / ln(2)  = 1.584  ....  This  set  is  a fractal. 

5.  Show  that  of  the  four  affine  transformations  shown  in  Figure  10.13.22,  only  the  transformation  T^  is  a similitude.  Determine  its  scale  factor  s and  rotation  angle  Q. 


Answer: 


s = .85O9...,0  = - 2.  69  ... 


6.  Find  the  coordinates  of  the  tip  of  the  fern  in  Figure  10.13.22.  [Hint:  The  transformation  Tj  maps  the  tip  of  the  fern  to  itself.] 

Answer: 

(0.766,  0.996)  rounded  to  three  decimal  places 

7.  The  square  in  Figure  10.13.7a  was  expressed  as  the  union  of  4 nonoverlapping  squares  as  in  Figure  10.13.76.  Suppose  that  it  is  expressed  instead  as  the  union  of 
16  nonoverlapping  squares.  Verify  that  its  Hausdorff  dimension  is  still  2,  as  determined  by  Equation  2. 

Answer: 

^(5)=ln(16)/ln(4)  = 2 

8.  Show  that  the  four  similitudes 
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express  the  unit  square  as  the  union  of  four  overlapping  squares.  Evaluate  the  right-hand  side  of  Equation  2 for  the  values  of  k and  s determined  by  these 
similitudes,  and  show  that  the  result  is  not  the  correct  value  of  the  Hausdorff  dimension  of  the  unit  square.  [Note:  This  exercise  shows  the  necessity  of  the 
nonoverlapping  condition  in  the  definition  of  a self-similar  set  and  its  Hausdorff  dimension.] 


= 4.818.. 


Answer: 

ln(4)/lng)  = ^ 

9.  All  of  the  results  in  this  section  can  be  extended  to  Rn.  Compute  the  Hausdorff  dimension  of  the  unit  cube  in  (see  Figure  Ex-9).  Given  that  the  topological 
dimension  of  the  unit  cube  is  3,  determine  whether  it  is  a fractal.  [Hint:  Express  the  unit  cube  as  the  union  of  eight  smaller  congruent  nonoverlapping  cubes.] 


Figure  Ex-9 


Answer: 

= ln(8)  / ln(2)  = 3;  the  cube  is  not  a fractal. 

10.  The  set  in  R^  in  Figure  Ex-10  is  called  the  Menger  sponge.  It  is  a self-similar  set  obtained  by  drilling  out  certain  square  holes  from  the  unit  cube.  Note  that  each 
face  of  the  Menger  sponge  is  a Sierpinski  carpet  and  that  the  holes  in  the  Sierpinski  carpet  now  run  all  the  way  through  the  Menger  sponge.  Determine  the  values 
of  k and  s for  the  Menger  sponge  and  find  its  Hausdorff  dimension.  Is  the  Menger  sponge  a fractal? 


Answer: 


k = 20;  s = dH{S)  = ln(20)  / ln(3)  = 2 726.. 
11.  The  two  similitudes 
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; the  set  is  a fractal. 
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determine  a fractal  known  as  the  Cantor  set.  Starting  with  the  unit  square  region  U as  an  initial  set,  sketch  the  first  four  sets  that  Algorithm  1 determines.  Also, 
find  the  Hausdorff  dimension  of  the  Cantor  set.  (This  famous  set  was  the  first  example  that  Hausdorff  gave  in  his  1919  paper  of  a set  whose  Hausdorff  dimension 
is  not  equal  to  its  topological  dimension.) 


Answer: 


Initial  set 


First  iterate 

Second  iterate 

Third  iterate 
Fourth  iterate 

=ln(2)  / ln(3)  = 0.6309... 

12.  Compute  the  areas  of  the  sets  £q,  S\,  S2,  S3,  and  S4  in  Figure  11.13.15. 

Answer: 

Area  ofS’o  = 1;  area  of  S']  = j = 0.888... ; area  of  Sj  = = 0.790... ; area  of  £3  = = 0.702... ; area  of  S4  = = 0.624... 

Technology  Exercises 

The  following  exercises  are  designed  to  be  solved  using  a technology  utility.  Typically,  this  will  be  matlab,  Mathematica,  Maple,  Derive,  or  Mathcad,  but  it  may 
also  be  some  other  type  of  linear  algebra  software  or  a scientific  calculator  with  some  linear  algebra  capabilities.  For  each  exercise  you  will  need  to  read  the  relevant 
documentation  for  the  particular  utility  you  are  using.  The  goal  of  these  exercises  is  to  provide  you  with  a basic  proficiency  with  your  technology  utility.  Once  you 
have  mastered  the  techniques  in  these  exercises,  you  will  be  able  to  use  your  technology  utility  to  solve  many  of  the  problems  in  the  regular  exercise  sets. 


Tl.  Use  similitudes  of  the  form 


ta 


to  show  that  the  Menger  sponge  (see  Exercise  10)  is  the  set  S satisfying 
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for  appropriately  chosen  similitudes  Tj  (for  i=  1,  2,  3, 20).  Determine  these  similitudes  by  determining  the  collection  of  3 x 1 matrices 

IT*, 


for  i = 1,  2,  3,...,  20 


T2.  Generalize  the  ideas  involved  in  the  Cantor  set  (in  Z?1),  the  Sierpinski  carpet  (in  $}),  and  the  Menger  sponge  (in  £3)  to  Rn  by  considering  the  set  S satisfying 

S=  U Tj(S) 

2=1 


with 


Ti 


J/”J  / 


1 0 0 ...  0 

0 1 0 ...  0 

0 0 1 ...  0 

0 0 0 ...  1 


12  1 

where  each  afo  equals  0,  -j,  or  and  no  two  of  them  ever  equal  -j  at  the  same  time.  Use  a computer  to  construct  the  set 


"^1j  1 

<*2i 

a3i 

fori  = 1,  2,  3, mn 


thereby  determining  the  value  of  mn  for  « = 2,  3,  4.  Then  develop  an  expression  for  mn. 
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10.14  Chaos 

In  this  section  we  use  a map  of  the  unit  square  in  the  xy- plane  onto  itself  to  describe  the  concept  of  a chaotic  mapping. 


Prerequisites 

Geometry  of  Linear  Operators  on  r}  (Section  4.11) 

Eigenvalues  and  Eigenvectors 

Intuitive  Understanding  of  Limits  and  Continuity 


Chaos 

The  word  chaos  was  first  used  in  a mathematical  sense  in  1975  by  Tien-Yien  Li  and  James  Yorke  in  a paper  entitled  “Period 
Three  Implies  Chaos.”  The  term  is  now  used  to  describe  the  behavior  of  certain  mathematical  mappings  and  physical  phenomena 
that  at  first  glance  seem  to  behave  in  a random  or  disorderly  fashion  but  actually  have  an  underlying  element  of  order  (examples 
include  random-number  generation,  shuffling  cards,  cardiac  arrhythmia,  fluttering  airplane  wings,  changes  in  the  red  spot  of 
Jupiter,  and  deviations  in  the  orbit  of  Pluto).  In  this  section  we  discuss  a particular  chaotic  mapping  called  Arnold's  cat  map , after 
the  Russian  mathematician  Vladimir  I.  Arnold  who  first  described  it  using  a diagram  of  a cat. 


Arnold's  Cat  Map 


To  describe  Arnold's  cat  map,  we  need  a few  ideas  about  modular  arithmetic.  If  x is  a real  number,  then  the  notation  * mod  1 
denotes  the  unique  number  in  the  interval  [0,  1)  that  differs  from  x by  an  integer.  For  example, 

2.3  mod  1 = 0.3,  0.9  mod  1 = 0.9,  - 3.7  mod  1 = 0.3,  2.0  mod  1 = 0 

Note  that  if  x is  a nonnegative  number,  then  x mod  1 is  simply  the  fractional  part  of  x.  If  (*,  y)  is  an  ordered  pair  of  real  numbers, 
then  the  notation  (*,  y)  mod  1 denotes  (x  mod  1 ,y  mod  1).  For  example, 

(2.3,  -7.9)  modi  = (0.3,  0.1) 

Observe  that  for  every  real  number  x,  the  point  x mod  1 lies  in  the  unit  interval  [0,  1)  and  that  for  every  ordered  pair  (*,  y ),  the 
point  (x,y)  mod  1 lies  in  the  unit  square 

S = { (x,  y)  |0  < x < 1,  0 <y  < 1 } 

Also  observe  that  the  upper  boundary  and  the  right-hand  boundary  of  the  square  are  not  included  in  S. 


Arnold's  cat  map  is  the  transformation  — ► R 2 defined  by  the  formula 

f:  (x,  y)  -►  (x  + y,  x 4=  2 y)  mod  1 


or,  in  matrix  notation, 


r 


1 1 
1 2 


x 

y 


mod  1 


(1) 


To  understand  the  geometry  of  Arnold's  cat  map,  it  is  helpful  to  write  1 in  the  factored  form 


r 


/PI 

1 

1 0" 

1 r 

"x" 

p\ 

r 

1 i_ 

_o  i_ 

y 

mod  1 


which  expresses  Arnold's  cat  map  as  the  composition  of  a shear  in  the  x-direction  with  factor  1 , followed  by  a shear  in  the 
y-direction  with  factor  1.  Because  the  computations  are  performed  mod  1,  r maps  all  points  of  r}  into  the  unit  square  S. 


We  will  illustrate  the  effect  of  Arnold's  cat  map  on  the  unit  square  S,  which  is  shaded  in  Figure  10.14.1a  and  contains  a picture  of 
a cat.  It  can  be  shown  that  it  does  not  matter  whether  the  mod  1 computations  are  carried  out  after  each  shear  or  at  the  very  end. 
We  will  discuss  both  methods,  first  performing  them  at  the  end.  The  steps  are  as  follows: 

Step  1 Shear  in  the  x-direction  with  factor  1 (Figure  10.14.1Z?): 

O.y)  — O+y.y) 

or  in  matrix  notation 

r*+.y- 

y 


[:  :R-r 


Step  2 Shear  in  the  y-direction  with  factor  1 (Figure  10.14.1c): 

0,y)  -» (*.*+y) 

or,  in  matrix  notation, 


Step  3 Reassembly  into  S (Figure  10.14.1J): 


[1 


0,y)  — ► O.y)  mod  1 


The  geometric  effect  of  the  mod  1 arithmetic  is  to  break  up  the  parallelogram  in  Figure  10.14.1c  and  reassemble  the  pieces  of  S as 
shown  inFigure  10.14.1J. 


if 


Step  1 : 
(*,>•) ->(.r 

2 0 


2 0 
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Step  3: 

(.t,  v)->(x 

. v)  mod  1 

(«> 


(/>) 


(C) 

Figure  10.14.1 
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For  computer  implementation,  it  is  more  convenient  to  perform  the  mod  1 arithmetic  at  each  step,  rather  than  at  the  end.  With  this 
approach  there  is  a reassembly  at  each  step,  but  the  net  effect  is  the  same.  The  steps  are  as  follows: 

Step  1 Shear  in  the  x-direction  with  factor  1,  followed  by  a reassembly  into  S (Figure  10.14.2Z?): 

(x,y)  — ► mod  1 

Step  2 Shear  in  the  y-direction  with  factor  1,  followed  by  a reassembly  into  S (Figure  10.14.2c): 

(x,y)  — ► (x,  x +y)  mod  1 


Figure  10.14.2 


Repeated  Mappings 


Chaotic  mappings  such  as  Arnold's  cat  map  usually  arise  in  physical  models  in  which  an  operation  is  performed  repeatedly.  For 
example,  cards  are  mixed  by  repeated  shuffles,  paint  is  mixed  by  repeated  stirs,  water  in  a tidal  basin  is  mixed  by  repeated  tidal 
changes,  and  so  forth.  Thus,  we  are  interested  in  examining  the  effect  on  S of  repeated  applications  (or  iterations)  of  Arnold's  cat 
map.  Figure  10.14.3,  which  was  generated  on  a computer,  shows  the  effect  of  25  iterations  of  Arnold's  cat  map  on  the  cat  in  the 
unit  square  S.  Two  interesting  phenomena  occur: 

The  cat  returns  to  its  original  form  at  the  25th  iteration. 

At  some  of  the  intermediate  iterations,  the  cat  is  decomposed  into  streaks  that  seem  to  have  a specific  direction. 

Much  of  the  remainder  of  this  section  is  devoted  to  explaining  these  phenomena. 


Figure  10.14.3 


Periodic  Points 

Our  first  goal  is  to  explain  why  the  cat  in  Figure  10.14.3  returns  to  its  original  configuration  at  the  25th  iteration.  For  this  purpose 
it  will  be  helpful  to  think  of  a picture  in  the  xy-plane  as  an  assignment  of  colors  to  the  points  in  the  plane.  For  pictures  generated 
on  a computer  screen  or  other  digital  device,  hardware  limitations  require  that  a picture  be  broken  up  into  discrete  squares,  called 
pixels.  For  example,  in  the  computer-generated  pictures  in  Figure  10.14.3  the  unit  square  S is  divided  into  a grid  with  101  pixels 
on  a side  for  a total  of  10,201  pixels,  each  of  which  is  black  or  white  (Figure  10.14.4).  An  assignment  of  colors  to  pixels  to  create 


a picture  is  called  a pixel  map. 


Enlarged  view  of  cat’s  face 
showing  individual  pixels 


Figure  10.14.4 

As  shown  in  Figure  10.14.5,  each  pixel  in  S can  be  assigned  a unique  pair  of  coordinates  of  the  form  {ml  101,  n l 101)  that 
identifies  its  lower  left-hand  corner,  where  m and  n are  integers  in  the  range  0,  1,2,...,  100.  We  call  these  points  pixel  points 
because  each  such  point  identifies  a unique  pixel.  Instead  of  restricting  the  discussion  to  the  case  where  S is  subdivided  into  an 
array  with  101  pixels  on  a side,  let  us  consider  the  more  general  case  where  there  are  p pixels  per  side.  Thus,  each  pixel  map  in  S 
consists  of  p pixels  uniformly  spaced  1 / p units  apart  in  both  the  x-  and  the  y-directions.  The  pixel  points  in  S have  coordinates 
of  the  form  {m  l p,n l p),  where  m and  n are  integers  ranging  from  0 to  p — 1 . 


(W  m) 


Figure  10.14.5 


Under  Arnold's  cat  map  each  pixel  point  of  S is  transformed  into  another  pixel  point  of  S.  To  see  why  this  is  so,  observe  that  the 
image  of  the  pixel  point  {m  l pfn  l p)  under  U is  given  in  matrix  form  by 
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The  ordered  pair  {{m  + «)  Ip,  + 2 n)  i p)  mod  1 is  of  the  form  (, m!  I p,nr  t p),  where  ml  and  nr  lie  in  the  range 
0,  1,  2, — 1.  Specifically,  m!  and  n*  are  the  remainders  when  and  m -\-2n  are  divided  by  p , respectively. 

Consequently,  each  point  in  S of  the  form  {m  f p,n  / p)  is  mapped  onto  another  point  of  the  same  form. 


(2) 


2 

Because  Arnold's  cat  map  transforms  every  pixel  point  of  S into  another  pixel  point  of  S , and  because  there  are  only  p different 

2 

pixel  points  in  S , it  follows  that  any  given  pixel  point  must  return  to  its  original  position  after  at  most  p iterations  of  Arnold's  cat 
map. 


EXAMPLE  1 Using  Formula  2 


If  p = 76,  then  2 becomes 
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In  this  case  the  successive  iterates  of  the  point  f — , — | 
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(verify).  Because  the  point  returns  to  its  initial  position  on  the  ninth  application  of  Arnold’s  cat  map  (but  no  sooner), 
the  point  is  said  to  have  period  9,  and  the  set  of  nine  distinct  iterates  of  the  point  is  called  a 9-cycle.  Figure  10.14.6 
shows  this  9-cycle  with  the  initial  point  labeled  0 and  its  successive  iterates  labeled  accordingly. 


Figure  10.14.6 


In  general,  a point  that  returns  to  its  initial  position  after  n applications  of  Arnold’s  cat  map,  but  does  not  return  with  fewer  than  n 
applications,  is  said  to  have  period  n , and  its  set  of  n distinct  iterates  is  called  an  n-cycle.  Arnold's  cat  map  maps  (0,  0)  into 
(0,  0),  so  this  point  has  period  1.  Points  with  period  1 are  also  called  fixed  points.  We  leave  it  as  an  exercise  (Exercise  11)  to 
show  that  (0,  0)  is  the  only  fixed  point  of  Arnold's  cat  map. 


Period  Versus  Pixel  Width 

If  Pi  and  Pj  are  points  with  periods  <71  and  72,  respectively,  then  Py  returns  to  its  initial  position  in  71  iterations  (but  no  sooner), 
and  P2  returns  to  its  initial  position  in  72  iterations  (but  no  sooner);  thus,  both  points  return  to  their  initial  positions  in  any  number 

'y 

of  iterations  that  is  a multiple  of  both  71  and  72.  In  general,  for  a pixel  map  with  p pixel  points  of  the  form  (m  f p,  n I p),  we  let 

riO?)  denote  the  least  common  multiple  of  the  periods  of  all  the  pixel  points  in  the  map  [i.e.,  II (7?)  is  the  smallest  integer  that  is 
divisible  by  all  of  the  periods].  It  follows  that  the  pixel  map  will  return  to  its  initial  configuration  in  II(/?)  iterations  of  Arnold's 
cat  map  (but  no  sooner).  For  this  reason,  we  call  n(/>)  the  period  of  the  pixel  map.  In  Exercise  4 we  ask  you  to  show  that  if 
p = 101,  then  all  pixel  points  have  period  1,  5,  or  25,  son(lOl)  = 25.  This  explains  why  the  cat  in  Figure  10.14.3  returned  to 
its  initial  configuration  in  25  iterations. 

Figure  10.14.7  shows  how  the  period  of  a pixel  map  varies  with  p.  Although  the  general  tendency  is  for  the  period  to  increase  as  p 
increases,  there  is  a surprising  amount  of  irregularity  in  the  graph.  Indeed,  there  is  no  simple  function  that  specifies  this 
relationship  (see  Exercise  1). 
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Figure  10.14.7 

Although  a pixel  map  with p pixels  on  a side  does  not  return  to  its  initial  configuration  until  FI(/?)  iterations  have  occurred, 
various  unexpected  things  can  occur  at  intermediate  iterations.  For  example,  Figure  10.14.8  shows  a pixel  map  with  p = 250  of 
the  famous  Hungarian- American  mathematician  John  von  Neumann.  It  can  be  shown  that  11(250)  =750;  hence,  the  pixel  map 
will  return  to  its  initial  configuration  after  750  iterations  of  Arnold's  cat  map  (but  no  sooner).  However,  after  375  iterations  the 
pixel  map  is  turned  upside  down,  and  after  another  375  iterations  (for  a total  of  750)  the  pixel  map  is  returned  to  its  initial 
configuration.  Moreover,  there  are  so  many  pixel  points  with  periods  that  divide  750  that  multiple  ghostlike  images  of  the  original 
likeness  occur  at  intermediate  iterations;  at  195  iterations  numerous  miniatures  of  the  original  likeness  occur  in  diagonal  rows. 


The  Tiled  Plane 


Our  next  objective  is  to  explain  the  cause  of  the  linear  streaks  that  occur  in  Figure  10.14.3.  For  this  purpose  it  will  be  helpful  to 
view  Arnold's  cat  map  another  way.  As  defined,  Arnold's  cat  map  is  not  a linear  transformation  because  of  the  mod  1 arithmetic. 
However,  there  is  an  alternative  way  of  defining  Arnold's  cat  map  that  avoids  the  mod  1 arithmetic  and  results  in  a linear 
transformation.  For  this  purpose,  imagine  that  the  unit  square  S with  its  picture  of  the  cat  is  a “tile,”  and  suppose  that  the  entire 
plane  is  covered  with  such  tiles,  as  in  Figure  10.14.9.  We  say  that  the  xy-plane  has  been  tiled  with  the  unit  square.  If  we  apply  the 
matrix  transformation  in  1 to  the  entire  tiled  plane  without  performing  the  mod  1 arithmetic,  then  it  can  be  shown  that  the  portion 
of  the  image  within  S will  be  identical  to  the  image  that  we  obtained  using  the  mod  1 arithmetic  (Figure  10.14.9).  In  short,  the 
tiling  results  in  the  same  pixel  map  in  S as  the  mod  1 arithmetic,  but  in  the  tiled  case  Arnold's  cat  map  is  a linear  transformation. 
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Figure  10.14.9 


It  is  important  to  understand,  however,  that  tiling  and  mod  1 arithmetic  produce  periodicity  in  different  ways.  If  a pixel  map  in  S 
has  period  n , then  in  the  case  of  mod  1 arithmetic,  each  point  returns  to  its  original  position  at  the  end  of  n iterations.  In  the  case 
of  tiling,  points  need  not  return  to  their  original  positions;  rather,  each  point  is  replaced  by  a point  of  the  same  color  at  the  end  of  n 
iterations. 


Properties  of  Arnold's  Cat  Map 


To  understand  the  cause  of  the  streaks  in  Figure  10.14.3,  think  of  Arnold's  cat  map  as  a linear  transformation  on  the  tiled  plane. 
Observe  that  the  matrix 


C = 


1 1 
1 2 


that  defines  Arnold's  cat  map  is  symmetric  and  has  a determinant  of  1 . The  fact  that  the  determinant  is  1 means  that  multiplication 
by  this  matrix  preserves  areas;  that  is,  the  area  of  any  figure  in  the  plane  and  the  area  of  its  image  are  the  same.  This  is  also  true 
for  figures  in  S in  the  case  of  mod  1 arithmetic,  since  the  effect  of  the  mod  1 arithmetic  is  to  cut  up  the  figure  and  reassemble  the 
pieces  without  any  overlap,  as  shown  in  Figure  10.14.1  d.  Thus,  in  Figure  10.14.3  the  area  of  the  cat  (whatever  it  is)  is  the  same  as 
the  total  area  of  the  blotches  in  each  iteration. 


The  fact  that  the  matrix  is  symmetric  means  that  its  eigenvalues  are  real  and  the  corresponding  eigenvectors  are  perpendicular.  We 
leave  it  for  you  to  show  that  the  eigenvalues  and  corresponding  eigenvectors  of  C are 

At  = 3+2^  = 2.6180...,  A2=  3~^  = 0.3819..., 


1 

r 1 ‘ 

[-1-^1 

—1.6180..." 

1 + 1/5 

1.6180... 

, V2  = 

2 

— 

1 

2 

L J 

1 

For  each  application  of  Arnold's  cat  map,  the  eigenvalue  Ai  causes  a stretching  in  the  direction  of  the  eigenvector  v\  by  a factor 

of  2. 6 1 80. . and  the  eigenvalue  A2  causes  a compression  in  the  direction  of  the  eigenvector  V2  by  a factor  of  0. 38 1 9 Figure 

10.14.10  shows  a square  centered  at  the  origin  whose  sides  are  parallel  to  the  two  eigenvector  directions.  Under  the  above 
mapping,  this  square  is  deformed  into  the  rectangle  whose  sides  are  also  parallel  to  the  two  eigenvector  directions.  The  area  of  the 


square  and  rectangle  are  the  same. 


Figure  10.14.10 

To  explain  the  cause  of  the  streaks  in  Figure  10. 14.3,  consider  S to  be  part  of  the  tiled  plane,  and  let  p be  a point  of  S with  period 
n.  Because  we  are  considering  tiling,  there  is  a point  q in  the  plane  with  the  same  color  as  p that  on  successive  iterations  moves 
toward  the  position  initially  occupied  by  p , reaching  that  position  on  the  nth  iteration.  This  point  is  q = (.4-1)  p =A~n p,  since 

A\  = A”{A-”v)=V 

Thus,  with  successive  iterations,  points  of  S flow  away  from  their  initial  positions,  while  at  the  same  time  other  points  in  the  plane 
(with  corresponding  colors)  flow  toward  those  initial  positions,  completing  their  trip  on  the  final  iteration  of  the  cycle.  Figure 

10.14.11  illustrates  this  in  the  case  where  n = 4,  q = | — -j,  j,  and  p = ^44q  = -jj.  Note  that 

p mod  1 = q mod  1 = j,  so  both  points  occupy  the  same  positions  on  their  respective  tiles.  The  outgoing  point  moves  in 

the  general  direction  of  the  eigenvector  v \ , as  indicated  by  the  arrows  in  Figure  10. 14. 1 1,  and  the  incoming  point  moves  in  the 
general  direction  of  eigenvector  V2.  It  is  the  “flow  lines”  in  the  general  directions  of  the  eigenvectors  that  form  the  streaks  in 
Figure  10.14.3. 
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Figure  10.14.11 


Nonperiodic  Points 

Thus  far  we  have  considered  the  effect  of  Arnold’s  cat  map  on  pixel  points  of  the  form  [m  / p,n  / p)  for  an  arbitrary  positive 
integer  p.  We  know  that  all  such  points  are  periodic.  We  now  consider  the  effect  of  Arnold's  cat  map  on  an  arbitrary  point  (a,  b) 
in  S.  We  classify  such  points  as  rational  if  the  coordinates  a and  b are  both  rational  numbers,  and  irrational  if  at  least  one  of  the 
coordinates  is  irrational.  Every  rational  point  is  periodic,  since  it  is  a pixel  point  for  a suitable  choice  of p.  For  example,  the 
rational  point  (rj  / s\,  t ^2)  can  written  as  ^2^1  / 5152)?  so  it  is  a pixel  point  with  p = s\£2-  It  can  be  shown 

(Exercise  13)  that  the  converse  is  also  true:  Every  periodic  point  must  be  a rational  point. 


It  follows  from  the  preceding  discussion  that  the  irrational  points  in  S are  nonperiodic,  so  that  successive  iterates  of  an  irrational 
point  j/q)  in  S must  all  be  distinct  points  in  S.  Figure  10.14.12,  which  was  computer  generated,  shows  an  irrational  point  and 
selected  iterates  up  to  100,000.  For  the  particular  irrational  point  that  we  selected,  the  iterates  do  not  seem  to  cluster  in  any 
particular  region  of  S ; rather,  they  appear  to  be  spread  throughout  S , becoming  denser  with  successive  iterations. 


Figure  10.14.12 


The  behavior  of  the  iterates  in  Figure  10.14.12  is  sufficiently  important  that  there  is  some  terminology  associated  with  it.  We  say 
that  a set  D of  points  in  S is  dense  in  S if  every  circle  centered  at  any  point  of  S encloses  points  of  D , no  matter  how  small  the 
radius  of  the  circle  is  taken  (Figure  10.14.13).  It  can  be  shown  that  the  rational  points  are  dense  in  S and  the  iterates  of  most  (but 
not  all)  of  the  irrational  points  are  dense  in  S. 


Arbitrary  circle 


Points  of  set  D 


in  S 


Definition  of  Chaos 

We  know  that  under  Arnold's  cat  map,  the  rational  points  of  S are  periodic  and  dense  in  S and  that  some  but  not  all  of  the 
irrational  points  have  iterates  that  are  dense  in  S.  These  are  the  basic  ingredients  of  chaos.  There  are  several  definitions  of  chaos  in 


current  use,  but  the  following  one,  which  is  an  outgrowth  of  a definition  introduced  by  Robert  L.  Devaney  in  1986  in  his  book  An 
Introduction  to  Chaotic  Dynamical  Systems  (Benjamin/Cummings  Publishing  Company),  is  most  closely  related  to  our  work. 

r n 


DEFINITION  1 

A mapping  T of  S onto  itself  is  said  to  be  chaotic  if: 

(i)  S contains  a dense  set  of  periodic  points  of  the  mapping  T. 

(ii)  There  is  a point  in  S whose  iterates  under  T are  dense  in  S. 


L J 

Thus  Arnold's  cat  map  satisfies  the  definition  of  a chaotic  mapping.  What  is  noteworthy  about  this  definition  is  that  a chaotic 
mapping  exhibits  an  element  of  order  and  an  element  of  disorder — the  periodic  points  move  regularly  in  cycles,  but  the  points 
with  dense  iterates  move  irregularly,  often  obscuring  the  regularity  of  the  periodic  points.  This  fusion  of  order  and  disorder 
characterizes  chaotic  mappings. 


Dynamical  Systems 

Chaotic  mappings  arise  in  the  study  of  dynamical  systems.  Informally  stated,  a dynamical  system  can  be  viewed  as  a system  that 
has  a specific  state  or  configuration  at  each  point  of  time  but  that  changes  its  state  with  time.  Chemical  systems,  ecological 
systems,  electrical  systems,  biological  systems,  economic  systems,  and  so  forth  can  be  looked  at  in  this  way.  In  a discrete-time 
dynamical  system , the  state  changes  at  discrete  points  of  time  rather  than  at  each  instant.  In  a discrete-time  chaotic  dynamical 
system , each  state  results  from  a chaotic  mapping  of  the  preceding  state.  For  example,  if  one  imagines  that  Arnold's  cat  map  is 
applied  at  discrete  points  of  time,  then  the  pixel  maps  in  Figure  10.14.3  can  be  viewed  as  the  evolution  of  a discrete-time  chaotic 
dynamical  system  from  some  initial  set  of  states  (each  point  of  the  cat  is  a single  initial  state)  to  successive  sets  of  states. 

One  of  the  fundamental  problems  in  the  study  of  dynamical  systems  is  to  predict  future  states  of  the  system  from  a known  initial 
state.  In  practice,  however,  the  exact  initial  state  is  rarely  known  because  of  errors  in  the  devices  used  to  measure  the  initial  state. 
It  was  believed  at  one  time  that  if  the  measuring  devices  were  sufficiently  accurate  and  the  computers  used  to  perform  the 
iteration  were  sufficiently  powerful,  then  one  could  predict  the  future  states  of  the  system  to  any  degree  of  accuracy.  But  the 
discovery  of  chaotic  systems  shattered  this  belief  because  it  was  found  that  for  such  systems  the  slightest  error  in  measuring  the 
initial  state  or  in  the  computation  of  the  iterates  becomes  magnified  exponentially,  thereby  preventing  an  accurate  prediction  of 
future  states.  Let  us  demonstrate  this  sensitivity  to  initial  conditions  with  Arnold's  cat  map. 

Suppose  that  Pq  is  a point  in  the  xy-plane  whose  exact  coordinates  are  (0.77837,  0.70904).  A measurement  error  of  0.00001  is 
made  in  the  y-coordinate,  such  that  the  point  is  thought  to  be  located  at  (0.77837,  0.70905),  which  we  denote  by  Qq.  Both  Pq 
and  Qq  are  pixel  points  with  p = 100,  000  (why?),  and  thus,  since  11(100,  000)  =75,  000,  both  return  to  their  initial  positions 
after  75,000  iterations.  In  Figure  10.14.14  we  show  the  first  50  iterates  of  Pq  under  Arnold's  cat  map  as  crosses  and  the  first  50 
iterates  of  go  as  circles.  Although  Pq  and  go  are  close  enough  that  their  symbols  overlap  initially,  only  their  first  eight  iterates 
have  overlapping  symbols;  from  the  ninth  iteration  on  their  iterates  follow  divergent  paths. 
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Figure  10.14.14 


It  is  possible  to  quantify  the  growth  of  the  error  from  the  eigenvalues  and  eigenvectors  of  Arnold's  cat  map.  For  this  purpose  we 
will  think  of  Arnold's  cat  map  as  a linear  transformation  on  the  tiled  plane.  Recall  from  Figure  10.14.10  and  the  related  discussion 
that  the  projected  distance  between  two  points  in  S in  the  direction  of  the  eigenvector  v \ increases  by  a factor  of2.6180...(  = Ai) 
with  each  iteration  (Figure  10.14.15).  After  nine  iterations  this  projected  distance  increases  by  a factor  of 

(2. 6 1 80. . .)  9 = 5777. 99. . .,  and  with  an  initial  error  of  roughly  1 / 1 00,  000  in  the  direction  of  v\ , this  distance  is  0. 05777. . or 

about  -jy  the  width  of  the  unit  square  S.  After  12  iterations  this  small  initial  error  grows  to  (2.6180...) 12  / 100,  000  = 1.0368..., 

which  is  greater  than  the  width  of  S.  Thus,  we  completely  lose  track  of  the  true  iterates  within  S after  12  iterations  because  of  the 
exponential  growth  of  the  initial  error. 
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Figure  10.14.15 

Although  sensitivity  to  initial  conditions  limits  the  ability  to  predict  the  future  evolution  of  dynamical  systems,  new  techniques 
are  presently  being  investigated  to  describe  this  future  evolution  in  alternative  ways. 


Exercise  Set  10.14 

1.  In  a journal  article  [F.  J.  Dyson  and  H.  Falk,  “Period  of  a Discrete  Cat  Mapping,”  The  American  Mathematical  Monthly,  99 
(August-September  1992),  pp.  603-614]  the  following  results  concerning  the  nature  of  the  function  n(;>)  were  established: 

(i)  Yl(p)  = 3 p if  and  only  if  p = 2 • 5*  for  k = 1,  2, .... 

(ii)  U(p)  = 2 p if  and  only  if;?  = 5*  for  lc=  1,  2, ...  or  p = 6 ■ 5*  for  lc  = 0,  1,  2, .... 

(iii)  n(/>)  < \ 2p  fl  for  all  other  choices  of p. 

Find n(250),  11(25),  11(125),  FI(30),  n(10),  n(50),  11(3750),  11(6),  and  11(5). 

Answer: 

ri(250)  = 750,  n(25)  = 50,  n(125)  = 250, 11(30)  = 60,  FI(10)  = 30, 11(50)  = 150, 11(3750)  = 7500,  n(6)  = 12, 
n(5)  = io 

2.  Find  all  the  n-cycles  that  are  subsets  of  the  36  points  in  S of  the  form  (m  I 6,  n f 6)  with  m and  n in  the  range  0,  1,2,  3,  4,  5. 
Then  find  FI( 6). 

Answer: 

{(!-“)'( 

’ (ft}  (if)'  (i  e]}md 


3.  (Fibonacci  Shift-Register  Random-Number  Generator)  A well-known  method  of  generating  a sequence  of  “pseudorandom” 
integers  xq,  x\,  x%  *3,  ...in  the  interval  from  0 to  p — 1 is  based  on  the  following  algorithm: 

(i)  Pick  any  two  integers  xq  and  x i from  the  range  0,  1,  2, . . p — 1. 

(ii)  Set  = (x„  + x„-\ ) mod p for  n = 1 , 2, . . .. 

Here  x mod p denotes  the  number  in  the  interval  from  0 to  p — 1 that  differs  from  x by  a multiple  of p.  For  example,  35  mod 
9 = 8 (because  8 = 35  - 3 • 9);  36  mod  9 = 0 (because  0 = 36-4-9);  and  -3  mod  9 = 6 (because  6 = -3  + 1-9). 

(a)  Generate  the  sequence  of  pseudorandom  numbers  that  results  from  the  choices  p = 15,  xq  = 3,  and  x\  = 7 until  the 
sequence  starts  repeating. 


(b)  Show  that  the  following  formula  is  equivalent  to  step  (ii)  of  the  algorithm: 


*M  + 1 

"1 

r 

xyi — 1 

xn+  2 

1 

2_ 

xn 

mod  p for  n = 1,  2,  3, ... 


(c)  Use  the  formula  in  part  (b)  to  generate  the  sequence  of  vectors  for  the  choices  p = 2 1 , xq  = 5,  and  x\  = 5 until  the 
sequence  starts  repeating. 


Answer: 


(a)  3,  7,  10,  2,  12,  14,  11,  10,  6,  1,  7,  8,  0,  8,  8,  1,  9,  10,  4,  14,  3,  2,  5,  7,  12,  4,  1,  5,  6,  11,  2,  13,  0,  13,  13,  11,  9,  5,  14,  4,  3,  7, 
(c)  (5,  5),  (10,  15),  (4,  19),  (2,  0),  (2,  2),  (4,  6),  (10,  16),  (5,  0),  (5,  5),.. 


If  we  take  p = 1 and  pick  xq  and  x i from  the  interval  [0,  1),  then  the  above  random-number  generator  produces 
pseudorandom  numbers  in  the  interval  [0,  1).  The  resulting  scheme  is  precisely  Arnold's  ct  map.  Furthermore,  if  we  eliminate 
the  modular  arithmetic  in  the  algorithm  and  take  xq  = x\  = 1,  then  the  resulting  sequence  of  integers  is  the  famous  Fibonacci 
sequence,  1,  1,  2,  3,  5,  8,  13,  21,  34,  55,  89, ...,  in  which  each  number  after  the  first  two  is  the  sum  of  the  preceding  two 
numbers. 


For  C = 


1 1 
1 2 5 


it  can  be  verified  that 


C25 


7,778,742,049  12,586,269,025 
12,586,269,025  20,365,011,074 


It  can  also  be  verified  that  12,586,269,025  is  divisible  by  101  and  that  when  7,778,742,049  and  20,365,011,074  are  divided  by 
101,  the  remainder  is  1. 


(a)  Show  that  every  point  in  S of  the  form  {mi  101,  n i 101)  returns  to  its  starting  position  after  25  iterations  under  Arnold's 
cat  map. 

(b)  Show  that  every  point  in  S of  the  form  {mi  101,  n i 101)  has  period  1 , 5,  or  25. 

(c)  Show  that  the  point  | ^ , 0 J has  period  greater  than  5 by  iterating  it  five  times. 

(d)  Show  that  11(101)  = 25. 


Answer: 


(c) 


The  first  five  iterates  of  (a^-.  o)  are  (ajip  (-jjjp  ajfp),  (ffp  -jjjj-),  (ajg-.  and  (a^p 


Show  that  for  the  mapping  T:S—*S  defined  by  T(x,  7)  = y j mod  1,  every  point  in  S is  a periodic  point.  Why  does 

this  show  that  the  mapping  is  not  chaotic? 

6.  An  Anosov  automorphism  on  R 2 is  a mapping  from  the  unit  square  S onto  S of  the  form 


in  which  (i)  a , b , c , and  d are  integers,  (ii)  the  determinant  of  the  matrix  is  ± ] , and  (iii)  the  eigenvalues  of  the  matrix  do  not 
have  magnitude  1.  It  can  be  shown  that  all  Anosov  automorphisms  are  chaotic  mappings. 

(a)  Show  that  Arnold's  cat  map  is  an  Anosov  automorphism. 

(b)  Which  of  the  following  are  the  matrices  of  an  Anosov  automorphism? 


1 

o 

'3  2' 

O 

1 0. 

'5  7' 
2 3 

_1  1_ 

'6  2' 
5 2 

_0  1_ 

(c)  Show  that  the  following  mapping  of  S onto  S is  not  an  Anosov  automorphism. 

PH->  j][;h 

What  is  the  geometric  effect  of  this  transformation  on  5?  Use  your  observation  to  show  that  the  mapping  is  not  a chaotic 
mapping  by  showing  that  all  points  in  S are  periodic  points. 


Answer: 

^ The  matrices  of  Anosov  automorphisms  are 
(c)  The  transformation  affects  a rotation  of  S through  9Q°  in  the  clockwise  direction. 

7.  Show  that  Arnold's  cat  map  is  one-to-one  over  the  unit  square  S and  that  its  range  is  S. 

8.  Show  that  the  inverse  of  Arnold's  cat  map  is  given  by 

r_1(x,>>)  = (2x-y,  -x+j)modl 

9.  Show  that  the  unit  square  S can  be  partitioned  into  four  triangular  regions  on  each  of  which  Arnold's  cat  map  is  a 
transformation  of  the  form 

PHI  ®W:] 

where  a and  b need  not  be  the  same  for  each  region.  [Hint:  Find  the  regions  in  S that  map  onto  the  four  shaded  regions  of  the 
parallelogram  in  Figure  10. 14. Id.] 


3 2 

1 1 


and 


5 7 
2 3 


Answer: 

(0, 1) 

(0. 1/2)  11 

I 

(0,0) 

In  region  I: 


(1. 1) 


IV 


III 


«•■»[;]-[!  $]+[:] 


(1.0) 


(0,1)  (1/2,1)  (1,1) 

nr  r 

IV*  ir 

(0,0)  (1/2,0)  (1,0) 


a~ 

"O' 

'a' 

O' 

~a~ 

-r 

b_ 

— 

0 

; in  region  II: 

_b_ 

— 

-1 

; in  region  III: 

_b_ 

— 

-i 

; in  region  IV: 


' a~ 

-f 

_b_ 

2_ 

10.  If  (*Qf  j/q)  is  a point  in  S and  yn ) is  its  rcth  iterate  under  Arnold's  cat  map,  show  that 

mod  1 


"i  f 

n 

"*o" 

/n_ 

1 2_ 

/0_ 

This  result  implies  that  the  modular  arithmetic  need  only  be  performed  once  rather  than  after  each  iteration. 

11.  Show  that  (0,  0)  is  the  only  fixed  point  of  Arnold's  cat  map  by  showing  that  the  only  solution  of  the  equation 


M- 

1 1 

w 

1 2_ 

7°. 

mod  1 


with  0 < < 1 and  0<y$<  1 is  xq  =70  = 0-  [Hint:  For  appropriate  nonnegative  integers,  r and  5,  we  can  write 


*0 

70 


1 1 
1 2 


*0 

70 


]-H 


for  the  preceding  equation.] 

12.  Find  all  2-cycles  of  Arnold's  cat  map  by  finding  all  solutions  of  the  equation 

n2r 


[»]-! 


1 


with  0 < xq  < 1 and  0 <70  < 1 • [Hint:  For  appropriate  nonnegative  integers,  r and  s,  we  can  write 


r-°1  = r2  3ir*o]  m 

[7oJ“[3  5JL70J  [s] 


for  the  preceding  equation.] 


Answer: 


and  |y,  form  one  2-cycle,  and  j and  form  another  2-cycle. 

13.  Show  that  every  periodic  point  of  Arnold's  cat  map  must  be  a rational  point  by  showing  that  for  all  solutions  of  the  equation 

mod  1 


"*o" 

T 

f 

n 

"*o“ 

_7°_ 

l 

2_ 

70  _ 

the  numbers  *0  and  y q are  quotients  of  integers. 

14.  Let  T be  the  Arnold's  cat  map  applied  five  times  in  a row;  that  is,  T = Figure  Ex- 14  represents  four  successive  mappings 
of  T on  the  first  image,  each  image  having  a resolution  of  101x101  pixels.  The  fifth  mapping  returns  to  the  first  image 
because  this  cat  map  has  a period  of  25.  Explain  how  you  might  generate  this  particular  sequence  of  images. 


Figure  Ex-14 


Answer: 

Begin  with  alQlxlOl  array  of  white  pixels  and  add  the  letter  4 A’  in  black  pixels  to  it.  Apply  the  mapping  to  this  image, 
which  will  scatter  the  black  pixels  throughout  the  image.  Then  superimpose  the  letter  ‘B’  in  black  pixels  onto  this  image. 
Apply  the  mapping  again  and  then  superimpose  the  letter  ‘C’  in  black  pixels  onto  the  resulting  image.  Repeat  this  procedure 
with  the  letters  ‘D’  and  ‘E’.  The  next  application  of  the  mapping  will  return  you  to  the  letter  ‘A’  with  the  pixels  for  the  letters 
‘B’  through  ‘E’  scattered  in  the  background. 

Technology  Exercises 


The  following  exercises  are  designed  to  be  solved  using  a technology  utility.  Typically,  this  will  be  matlab,  Mathematical 
Maple,  Derive,  or  Mathcad,  but  it  may  also  be  some  other  type  of  linear  algebra  software  or  a scientific  calculator  with  some 
linear  algebra  capabilities.  For  each  exercise  you  will  need  to  read  the  relevant  documentation  for  the  particular  utility  you  are 
using.  The  goal  of  these  exercises  is  to  provide  you  with  a basic  proficiency  with  your  technology  utility.  Once  you  have 
mastered  the  techniques  in  these  exercises,  you  will  be  able  to  use  your  technology  utility  to  solve  many  of  the  problems  in  the 
regular  exercise  sets. 


Tl.  The  methods  of  Exercise  4 show  that  for  the  cat  map,  is  the  smallest  integer  satisfying  the  equation 


1 1 
1 2 


mod  p = 


1 0 
0 1 


This  suggests  that  one  way  to  determine  FIO?)  is  to  compute 

| nodp 

starting  with  ^ = 1 and  stopping  when  this  produces  the  identity  matrix.  Use  this  idea  to  compute  n(/>)  for  p = 2,  3, 
Compare  your  results  to  the  formulas  given  in  Exercise  1,  if  they  apply.  What  can  you  conjecture  about 


1 1|  2 


mod  p 


when  110?)  is  even? 

T2.  The  eigenvalues  and  eigenvectors  for  the  cat  map  matrix 


vl=  1 4-  v 5 • v2  — 1 — ^5 

2 2 


Using  these  eigenvalues  and  eigenvectors,  we  can  define 


3 + y 5 
2 


W5 

2 


and  P=  I + 1/5  l-i/5 


and  write  Q — PDP  * ; hence,  Qn  — P£)np  * . Use  a computer  to  show  that 

(«)  00 

C"=  11  12 

M M 

c2l  c22 


(ri)_(  l + |/5  \(  3-/5  f /l-/5  V 3 + |/S 


>)_>)_  1 p+ 

c12  ~c21  - rr\  - 


" (3 -J5\n 


How  can  you  use  these  results  and  your  conclusions  in  Exercise  T1  to  simplify  the  method  for  computing  n(/>)? 
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10.15  Cryptography 

In  this  section  we  present  a method  of  encoding  and  decoding  messages.  We  also  examine  modular  arithmetic  and  show 
how  Gaussian  elimination  can  sometimes  be  used  to  break  an  opponent's  code. 


Prerequisites 

Matrices 

Gaussian  Elimination 

Matrix  Operations 

Linear  Independence 

Linear  Transformations  (Section  4.9) 


Ciphers 

The  study  of  encoding  and  decoding  secret  messages  is  called  cryptography.  Although  secret  codes  date  to  the  earliest  days 
of  written  communication,  there  has  been  a recent  surge  of  interest  in  the  subject  because  of  the  need  to  maintain  the 
privacy  of  information  transmitted  over  public  lines  of  communication.  In  the  language  of  cryptography,  codes  are  called 
ciphers , uncoded  messages  are  called  plaintext , and  coded  messages  are  called  ciphertext.  The  process  of  converting  from 
plaintext  to  ciphertext  is  called  enciphering , and  the  reverse  process  of  converting  from  ciphertext  to  plaintext  is  called 
deciphering. 

The  simplest  ciphers,  called  substitution  ciphers , are  those  that  replace  each  letter  of  the  alphabet  by  a different  letter.  Lor 
example,  in  the  substitution  cipher 

Plain  ABCDELGHI  J KLMNOPQRS  TUVWXYZ 
Cipher  DELGHI  JKLMNOPQRSTUVWXYZABC 

the  plaintext  letter  A is  replaced  by  D , the  plaintext  letter  B by  E , and  so  forth.  With  this  cipher  the  plaintext  message 

ROME  WAS  NOT  BUILT  IN  A DAY 

becomes 

URPH  ZDV  QRWEXLOWLQ  D GDB 


Hill  Ciphers 

A disadvantage  of  substitution  ciphers  is  that  they  preserve  the  frequencies  of  individual  letters,  making  it  relatively  easy  to 
break  the  code  by  statistical  methods.  One  way  to  overcome  this  problem  is  to  divide  the  plaintext  into  groups  of  letters  and 
encipher  the  plaintext  group  by  group,  rather  than  one  letter  at  a time.  A system  of  cryptography  in  which  the  plaintext  is 
divided  into  sets  of  n letters,  each  of  which  is  replaced  by  a set  of  n cipher  letters,  is  called  a polygraphic  system.  In  this 
section  we  will  study  a class  of  polygraphic  systems  based  on  matrix  transformations.  [The  ciphers  that  we  will  discuss  are 
called  Hill  ciphers  after  Lester  S.  Hill,  who  introduced  them  in  two  papers:  “Cryptography  in  an  Algebraic  Alphabet,” 
American  Mathematical  Monthly,  36  (June- July  1929),  pp.  306-312;  and  “Concerning  Certain  Linear  Transformation 
Apparatus  of  Cryptography,”  American  Mathematical  Monthly,  38  (March  1931),  pp.  135-154.] 


In  the  discussion  to  follow,  we  assume  that  each  plaintext  and  ciphertext  letter  except  Z is  assigned  the  numerical  value  that 
specifies  its  position  in  the  standard  alphabet  (Table  1).  For  reasons  that  will  become  clear  later,  Z is  assigned  a value  of 
zero. 


Table  1 


A B C D 

E 

F G 

H 

/ 

J 

K 

L M 

iV  0 

P 

Q 

R 

S T V 

V W 

X 

Y 

Z 

12  3 4 

5 

6 7 

8 

9 

10 

11 

12  13 

14  15 

16 

17 

18 

19  20  21 

22  23 

24 

25 

0 

In  the  simplest  Hill  ciphers,  successive  pairs  of  plaintext  are  transformed  into  ciphertext  by  the  following  procedure: 


Step  1 Choose  a 2 x 2 matrix  with  integer  entries 


*11  *12 
*21  *22 


to  perform  the  encoding.  Certain  additional  conditions  on  A will  be  imposed  later. 

Step  2 Group  successive  plaintext  letters  into  pairs,  adding  an  arbitrary  “dummy”  letter  to  fill  out  the  last  pair  if  the 
plaintext  has  an  odd  number  of  letters,  and  replace  each  plaintext  letter  by  its  numerical  value. 

Step  3 Successively  convert  each  plaintext  pair  p \P2  into  a column  vector 

~P\ 


and  form  the  product  Ap.  We  will  call  p a plaintext  vector  and  Ap  the  corresponding  ciphertext  vector. 
Step  4 Convert  each  ciphertext  vector  into  its  alphabetic  equivalent. 


EXAMPLE  1 Hill  Cipher  of  a Message 

Use  the  matrix 

"1  2" 

.0  3. 

to  obtain  the  Hill  cipher  for  the  plaintext  message 

l AM  HIDING 


If  we  group  the  plaintext  into  pairs  and  add  the  dummy  letter  G to  fill  out  the  last  pair,  we  obtain 

IA  MR  ID  IN  GG 

or,  equivalently,  from  Table  1 , 

91  13  8 94  9 14  77 

To  encipher  the  pair  IA,  we  form  the  matrix  product 


"1  2' 

"9' 

'll' 

_°  3. 

_1_ 

_ 3_ 

which,  from  Table  1,  yields  the  ciphertext  KC. 
To  encipher  the  pair  MH,  we  form  the  product 


'1  2' 

13 

'29' 

_°  3_ 

8_ 

_24_ 

However,  there  is  a problem  here,  because  the  number  29  has  no  alphabet  equivalent  (Table  1).  To  resolve 
this  problem,  we  make  the  following  agreement: 


Whenever  an  integer  greater  than  25  occurs , it  will  be 
replaced  by  the  remainder  that  results  when  this 
integer  is  divided  by  26  . 

Because  the  remainder  after  division  by  26  is  one  of  the  integers  0,  1,  2, 25,  this  procedure  will  always 
yield  an  integer  with  an  alphabet  equivalent. 

Thus,  in  1 we  replace  29  by  3,  which  is  the  remainder  after  dividing  29  by  26.  It  now  follows  from  Table  1 
that  the  ciphertext  for  the  pair  MH  is  CX. 

The  computations  for  the  remaining  ciphertext  vectors  are 


~\  2 

'9' 

'17' 

.0  3. 

4 

12 

'1  2' 

" 

9" 

'37' 

'11 

_o  3. 

14 

42 

or 

16 

'1  2 

7 

'21' 

0 3 

7 

21 

These  correspond  to  the  ciphertext  pairs  QL , KP , and  UU,  respectively.  In  summary,  the  entire  ciphertext 
message  is 

KC  CX  QL  KP  UU 
which  would  usually  be  transmitted  as  a single  string  without  spaces: 

KCCXQLKPUU 


Because  the  plaintext  was  grouped  in  pairs  and  enciphered  by  a 2 x 2 matrix,  the  Hill  cipher  in  Example  1 is  referred  to  as 
Hill  2-cipher.  It  is  obviously  also  possible  to  group  the  plaintext  in  triples  and  encipher  by  a 3 x 3 matrix  with  integer 
entries;  this  is  called  a Hill  3-cipher.  In  general,  for  a Hill  n-cipher , plaintext  is  grouped  into  sets  of  n letters  and 
enciphered  by  an  « x n matrix  with  integer  entries. 


Modular  Arithmetic 

In  Example  1,  integers  greater  than  25  were  replaced  by  their  remainders  after  division  by  26.  This  technique  of  working 
with  remainders  is  at  the  core  of  a body  of  mathematics  called  modular  arithmetic.  Because  of  its  importance  in 
cryptography,  we  will  digress  for  a moment  to  touch  on  some  of  the  main  ideas  in  this  area. 

In  modular  arithmetic  we  are  given  a positive  integer  m , called  the  modulus , and  any  two  integers  whose  difference  is  an 
integer  multiple  of  the  modulus  are  regarded  as  “equal”  or  “equivalent”  with  respect  to  the  modulus.  More  precisely,  we 
make  the  following  definition. 


DEFINITION  1 

If  m is  a positive  integer  and  a and  b are  any  integers,  then  we  say  that  a is  equivalent  to  b modulo  m , written 

a = b (mod  m) 


if  a _ £ is  an  integer  multiple  of  m. 


EXAMPLE  2 Various  Equivalences 


7 = 2 

(mod  5) 

19  = 3 

(mod  2) 

-1  = 25 

(mod  26) 

12  = 0 

(mod  4) 

For  any  modulus  m it  can  be  proved  that  every  integer  a is  equivalent,  modulo  m , to  exactly  one  of  the  integers 

0,  1,  2, m — 1 

We  call  this  integer  the  residue  of  a modulo  m , and  we  write 

Zm  = {0,  1,  2, w — 1} 

to  denote  the  set  of  residues  modulo  m. 

If  a is  a nonnegative  integer,  then  its  residue  modulo  m is  simply  the  remainder  that  results  when  a is  divided  by  m.  For  an 
arbitrary  integer  a , the  residue  can  be  found  using  the  following  theorem. 


THEOREM  10.15.1 

For  any  integer  a and  modulus  m , let 

R = remainder  of 

J m 

Then  the  residue  r of  a modulo  m is  given  by 

(R  if 

r = )m  — R if 

[o  if 

EXAMPLE  3 Residues  mod  26 

Find  the  residue  modulo  26  of  (a)  87,  (b)  —38,  and  (c)  -26- 

Solution 

Dividing  |87 1 = 87  by  26  yields  a remainder  of  R = 9,  so  r = 9-  Thus, 

87  = 9 (mod  26) 

»)  Dividing  | — 38|  = 38  by  26  yields  a remainder  of  R = 12,  so  r = 26  — 12  = 14-  Thus, 

-38=14  (mod  26) 

Dividing  | — 26 1 = 26  by  26  yields  a remainder  of  R = 0-  Thus, 

-26  = 0 (mod  26) 


In  ordinary  arithmetic  every  nonzero  number  a has  a reciprocal  or  multiplicative  inverse , denoted  by  a * , such  that 


a>  0 
a < 0 

and 

R*  0 

a < 0 

and 

R = 0 

aa  1 = a = 1 

In  modular  arithmetic  we  have  the  following  corresponding  concept: 


DEFINITION  2 

If  a is  a number  in  Zm , then  a number  a-1  in  Zm  is  called  a reciprocal  or  multiplicative  inverse  of  a modulo  m if 
aa~^  =a~^a  = 1 (mod  m) . 


It  can  be  proved  that  if  a and  m have  no  common  prime  factors,  then  a has  a unique  reciprocal  modulo  m ; conversely,  if  a 
and  m have  a common  prime  factor,  then  a has  no  reciprocal  modulo  m. 

EXAMPLE  4 Reciprocal  of  3 mod  26 


The  number  3 has  a reciprocal  modulo  26  because  3 and  26  have  no  common  prime  factors.  This  reciprocal 
can  be  obtained  by  finding  the  number  x in  Z26  that  satisfies  the  modular  equation 

3x  = 1 (mod  26) 


Although  there  are  general  methods  for  solving  such  modular  equations,  it  would  take  us  too  far  afield  to 
study  them.  However,  because  26  is  relatively  small,  this  equation  can  be  solved  by  trying  the  possible 
solutions,  0 to  25,  one  at  a time.  With  this  approach  we  find  that  x = 9 is  the  solution,  because 

3 ■ 9 = 27  = 1 (mod  26) 


Thus, 


3-1  = 9 (mod  26) 


EXAMPLE  5 A Number  with  No  Reciprocal  mod  26 

The  number  4 has  no  reciprocal  modulo  26,  because  4 and  26  have  2 as  a common  prime  factor  (see  Exercise 

8). 


For  future  reference,  in  Table  2 we  provide  the  following  reciprocals  modulo  26: 

Reciprocals  Modulo  26 


a 

1 

3 

5 

7 

9 

11 

15 

17 

19 

21 

23 

25 

a~l 

1 

9 

21 

15 

3 

19 

7 

23 

11 

5 

17 

25 

Deciphering 

Every  useful  cipher  must  have  a procedure  for  decipherment.  In  the  case  of  a Hill  cipher,  decipherment  uses  the  inverse 
(mod  26)  of  the  enciphering  matrix.  To  be  precise,  if  m is  a positive  integer,  then  a square  matrix  A with  entries  in  Zm  is 
said  to  be  invertible  modulo  m if  there  is  a matrix  B with  entries  in  Zm  such  that 


Suppose  now  that 


AB  = BA  = / (mod  m) 


<*  11  a\2 
a2\  22 


is  invertible  modulo  26  and  this  matrix  is  used  in  a Hill  2-cipher.  If 


P = 


PI 

P2 


(1) 


is  a plaintext  vector,  then 


c = (mod  26) 


is  the  corresponding  ciphertext  vector  and 

P = j4-1c  (mod  26) 


Thus,  each  plaintext  vector  can  be  recovered  from  the  corresponding  ciphertext  vector  by  multiplying  it  on  the  left  by 
(mod  26). 


In  cryptography  it  is  important  to  know  which  matrices  are  invertible  modulo  26  and  how  to  obtain  their  inverses.  We  now 
investigate  these  questions. 

In  ordinary  arithmetic,  a square  matrix  A is  invertible  if  and  only  if  det(-d)  * 0,  or,  equivalently,  if  and  only  if  det(^4)  has  a 
reciprocal.  The  following  theorem  is  the  analog  of  this  result  in  modular  arithmetic. 


THEOREM  10.15.2 

A square  matrix  A with  entries  in  Zm  is  invertible  modulo  m if  and  only  if  the  residue  of  det(^4)  modulo  m has  a 
reciprocal  modulo  m. 


Because  the  residue  of  det(^4)  modulo  m will  have  a reciprocal  modulo  m if  and  only  if  this  residue  and  m have  no  common 
prime  factors,  we  have  the  following  corollary. 


COROLLARY  10.15.3 

A square  matrix  A with  entries  in  Zm  is  invertible  modulo  m if  and  only  if  m and  the  residue  of  det(^4)  modulo  m 
have  no  common  prime  factors. 


Because  the  only  prime  factors  of  ^ = 26  are  2 and  13,  we  have  the  following  corollary,  which  is  useful  in  cryptography. 


COROLLARY  10.15.4 

A square  matrix  A with  entries  in  Z26  is  invertible  modulo  26  if  and  only  if  the  residue  of  det(-d)  modulo  26  is  not 
divisible  by  2 or  13. 


We  leave  it  for  you  to  verify  that  if 


A = 


a b 

c d 


has  entries  in  Z26  and  the  residue  of  det(-d)  =ad  — be  modulo  26  is  not  divisible  by  2 or  13,  then  the  inverse  of  A (mod 
26)  is  given  by 


A~l  = (ad -bc)~' 


d -b 
—c  a 


(mod  26) 


where  (ad  — be)  1 is  the  reciprocal  of  the  residue  of  ad  — be  (mod  26). 

EXAMPLE  6 Inverse  of  a Matrix  mod  26 

Find  the  inverse  of 


A = 


5 6 
2 3 


modulo  26. 

Solution 

so  from  Table  2, 
Thus,  from  2, 

As  a check, 

Similarly,  A~^A  = I 


det  (A)=ad-bc  = 5-  3 — 6-2  = 3 


(ad  — be) -1  = 3-1  = 9 (mod  26) 


A~x  = 9 


AA~l  = 


3 -6 
-2  5 


H- 


27  -54 
18  45 


1 24 
8 19 


'5  6' 

l 

^r 

C\] 

i 

[53  234' 

o 

2 3_ 

00 

\o 

1 

wn ' 
o 

CM 

r~ 

1 

o 

(mod  26) 


(mod  26) 


(2) 


EXAMPLE  7 Decoding  a Hill  2-Cipher 

Decode  the  following  Hill  2-cipher,  which  was  enciphered  by  the  matrix  in  Example  6: 

GTNKGKDUSK 

From  Table  1 the  numerical  equivalent  of  this  ciphertext  is 

7 20  14  11  7 11  4 21  19  11 

To  obtain  the  plaintext  pairs,  we  multiply  each  ciphertext  vector  by  the  inverse  of  A (obtained  in  Example  6): 


24 

19 

24 

19 

24 

19 

24 

19 

24 

19 


M ■ 

’487] _ r 19' 
436 J [20 

(mod  26) 

In]  - 

278]  _ T 18" 
321 J [ 9 

(mod  26) 

In]  " 

'27i]  nr 

265J  [ 5 

(mod  26) 

M ■ 

1 1 

wo 

■ ■ 

11 

1 1 

OO  T— 

0 CO 

in  ^r 

1  1 

(mod  26) 

In]  " 

"283]  _ [23' 
361 J [23 

(mod  26) 

From  Table  1,  the  alphabet  equivalents  of  these  vectors  are 

ST  PI  KE  NO 

which  yields  the  message 

STRIKE  NOW 


ww 


Breaking  a Hill  Cipher 

Because  the  purpose  of  enciphering  messages  and  information  is  to  prevent  “opponents”  from  learning  their  contents, 
cryptographers  are  concerned  with  the  security  of  their  ciphers — that  is,  how  readily  they  can  be  broken  (deciphered  by 
their  opponents).  We  will  conclude  this  section  by  discussing  one  technique  for  breaking  Hill  ciphers. 

Suppose  that  you  are  able  to  obtain  some  corresponding  plaintext  and  ciphertext  from  an  opponent's  message.  For  example, 
on  examining  some  intercepted  ciphertext,  you  may  be  able  to  deduce  that  the  message  is  a letter  that  begins  DEAR  SIR.  We 
will  show  that  with  a small  amount  of  such  data,  it  may  be  possible  to  determine  the  deciphering  matrix  of  a Hill  code  and 
consequently  obtain  access  to  the  rest  of  the  message. 

It  is  a basic  result  in  linear  algebra  that  a linear  transformation  is  completely  determined  by  its  values  at  a basis.  This 
principle  suggests  that  if  we  have  a Hill  ^-cipher,  and  if 

Pl»P2.-v  Pn 

are  linearly  independent  plaintext  vectors  whose  corresponding  ciphertext  vectors 

are  known,  then  there  is  enough  information  available  to  determine  the  matrix  A and  hence  A * (mod  m) . 

The  following  theorem,  whose  proof  is  discussed  in  the  exercises,  provides  a way  to  do  this. 


Determining  the  Deciphering  Matrix 

Let  pi,  P2>  Pm  be  linearly  independent  plaintext  vectors,  and  let  cj,  C2,  - c„  be  the  corresponding  ciphertext 
vectors  in  a Hill  ^-cipher.  If 


p= 


Pi 

T 

P2 


T 

Pw 


is  the  yi  x n matrix  with  row  vectors  pj  , pj, p ln  and  if 


C = 


J 

c2 


cT 
c n 


is  the  yi  x n matrix  with  row  vectors  cj  , c^, cj,  then  the  sequence  of  elementary  row  operations  that  reduces  C 
to  / transforms  to  (^4  “1 ) . 


This  theorem  tells  us  that  to  find  the  transpose  of  the  deciphering  matrix  A 1 , we  must  find  a sequence  of  row  operations 
that  reduces  Ctol  and  then  perform  this  same  sequence  of  operations  on  P.  The  following  example  illustrates  a simple 
algorithm  for  doing  this. 

EXAMPLE  8 Using  Theorem  10.15.5 


The  following  Hill  2-cipher  is  intercepted: 

IOSBTGXESPXHOPDE 

Decipher  the  message,  given  that  it  starts  with  the  word  DEAR. 

From  Table  1 , the  numerical  equivalent  of  the  known  plaintext  is 

DE  AR 

4 5 1 18 

and  the  numerical  equivalent  of  the  corresponding  ciphertext  is 

10  SB 

9 15  19  2 


so  the  corresponding  plaintext  and  ciphertext  vectors  are 

'4' 


PI  = 


P2  = 


«->  ci  = 


1 

18 


«-» C2  = 


9 

15 

"19 

2 


We  want  to  reduce 


to  / by  elementary  row  operations  and  simultaneously  apply  these  operations  to 


r n 

Pi 

"4  5" 

nr 

1 18 

P2 

P = 


1 

to  obtain  (A~^)'  (the  transpose  of  the  deciphering  matrix).  This  can  be  accomplished  by  adjoining  P to  the 

right  of  C and  applying  row  operations  to  the  resulting  matrix  [C|.P]  until  the  left  side  is  reduced  to  7.  The 

-1  7 

final  matrix  will  then  have  the  form  [1  {A  ) ] . The  computations  can  be  carried  out  as  follows: 


9 15 
19  2 

1 45 
19  2 

1 19 
19  2 

1 19 

0 -359 

1 19 

0 5 

1 19 
0 1 

1 19 
0 1 

1 0 
0 1 

1 0 
0 1 


4 5 

1 18 

12  15' 

1 18 

12  15' 

1 18 

12  15 

-227  -267 

12  15' 

7 19 

12  15' 

147  399 

12  15' 

17  9 

-311  -156' 

17  9 

1 O' 

17  9 


«—  We  formed  the  matrix  [C  |.P  ] . 

<—  We  multiplied  the  first  row  by  9 _1  = 3 . 

<—  We  replaced  45  by  its  residue  modulo  26  . 

«—  We  added  — 19  times  the  first  row  to  the  second  . 

«—  We  replaced  the  entries  in  the  second  row  by  their  residues  modulo  26 
<—  We  multiplied  the  second  row  by  5-1  = 21  . 

<—  We  replaced  the  entries  in  the  second  row  by  their  residues  modulo  26 
«—  We  added  — 19  times  the  second  row  to  the  first  . 

«—  We  replaced  the  entries  in  the  first  row  by  their  residues  modulo  26  . 


Thus, 


so  the  deciphering  matrix  is 


04  _1) 


1 0 

17  9 


A~l 


1 17 
0 9 


To  decipher  the  message,  we  first  group  the  ciphertext  into  pairs  and  find  the  numerical  equivalent  of  each 
letter: 

10  SB  TG  XE  SP  XH  OP  DE 
9 15  19  2 20  7 24  5 19  16  24  8 15  16  4 5 

Next,  we  multiply  successive  ciphertext  vectors  on  the  left  by  ~ * and  find  the  alphabet  equivalents  of  the 
resulting  plaintext  pairs: 


(mod  26) 


1 

0 

1 

0 

1 

0 

1 

0 

1 

0 

1 

0 

1 

0 

"1 

0 


17 

9 

17 

9 

17 

9 

17 

9 

17 

9 

17 

9 

17 

9 


17 


'4' 

D 

5_ 

E 

f 

A 

18_ 

R 

9" 

I 

11_ 

K 

' 5' 

E 

.19. 

S 

5' 

E 

14 

N 

' 4' 

D 

_20_ 

T 

f 

A 

_14_ 

N 

'll' 

K 

19 

S 

Finally,  we  construct  the  message  from  the  plaintext  pairs: 

DE  AR  IK  ES  EN  DT  AN  KS 
DEAR  IKE  SEND  TANKS 


Further  Readings 

Readers  interested  in  learning  more  about  mathematical  cryptography  are  referred  to  the  following  books,  the  first 
of  which  is  elementary  and  the  second  more  advanced. 

1.  Abraham  Sinkov,  Elementary  Cryptanalysis,  a Mathematical  Approach  (Mathematical  Association  of  America,  2009). 

2.  Alan  G.  Konheim,  Cryptography,  a Primer  (New  York:  Wiley-Interscience,  1981). 


Exercise  Set  10.15 


1.  Obtain  the  Hill  cipher  of  the  message 


for  each  of  the  following  enciphering  matrices: 

(a)  [1  3 

_2  1 

(b)  [4  3' 

1 2 


DARK  NIGHT 


Answer: 


(a)  GIYUOKEVBH 

(b)  SFANEFZWJH 


2.  In  each  part  determine  whether  the  matrix  is  invertible  modulo  26.  If  so,  find  its  inverse  modulo  26  and  check  your  work 
by  verifying  that  AA  = A = I (mod  26). 


(a)  A 
V A 
(0  A 

(d)  A 
(0  A 
® A 


9 1 

7 2. 

3 r 

5 3_ 

8 11 

1 9 

2 r 

1 7_ 

3 f 

6 2_ 
1 8" 
1 3 


Answer: 


(a)  ^-1 


12  T 
23  15 


(b) 

(c) 


Not  invertible 


i!"1 


1 19' 
23  24  _ 


(d) 

(e) 

(f) 


Not  invertible 


Not  invertible 


15  12 
21  5 


3.  Decode  the  message 


SAKNOXAOJX 


given  that  it  is  a Hill  cipher  with  enciphering  matrix 


4 1 
3 2 


Answer: 


WE  LOVE  MATH 

4.  A Hill  2-cipher  is  intercepted  that  starts  with  the  pairs 

SLHK 

Find  the  deciphering  and  enciphering  matrices,  given  that  the  plaintext  is  known  to  start  with  the  word  ARMY. 


Answer: 

Deciphering  matrix  = 


7 15 
6 5 


; enciphering  matrix  = 


7 5 
2 15 


5.  Decode  the  following  Hill  2-cipher  if  the  last  four  plaintext  letters  are  known  to  be  ATOM. 


LNG1HGYBVRBNJYQO 


Answer: 

THEY  SPLIT  THE  ATOM 

6.  Decode  the  following  Hill  3-cipher  if  the  first  nine  plaintext  letters  are  IHAVECOME : 

HPAFQGGDUGDDHPGODYNOR 

Answer: 

I HAVE  COME  TO  BURY  CAESAR 

7.  All  of  the  results  of  this  section  can  be  generalized  to  the  case  where  the  plaintext  is  a binary  message;  that  is,  it  is  a 

sequence  of  0's  and  l's.  In  this  case  we  do  all  of  our  modular  arithmetic  using  modulus  2 rather  than  modulus  26.  Thus, 
for  example,  1 1 = 0 (mod  2).  Suppose  we  want  to  encrypt  the  message  110101111.  Let  us  first  break  it  into  triplets  to 

'll  m m ri  1 o~ 

form  the  three  vectors  1,0,1,  and  let  us  take  0 1 1 as  our  enciphering  matrix. 

oj  [lj  [lj  [l  1 1 

(a)  Find  the  encoded  message. 

(b)  Find  the  inverse  modulo  2 of  the  enciphering  matrix,  and  verify  that  it  decodes  your  encoded  message. 

Answer: 


(a)  010110001 

(b)  ro  i r 

i i i 
1 0 1 

8.  If,  in  addition  to  the  standard  alphabet,  a period,  comma,  and  question  mark  were  allowed,  then  29  plaintext  and 
ciphertext  symbols  would  be  available  and  all  matrix  arithmetic  would  be  done  modulo  29.  Under  what  conditions 
would  a matrix  with  entries  in  Z29  be  invertible  modulo  29? 

Answer: 


A is  invertible  modulo  29  if  and  only  if  det(-d)  =£  0 (mod  29). 

9.  Show  that  the  modular  equation  Ax  = 1 (mod  26)  has  no  solution  in  Z2$  by  successively  substituting  the  values 
x = 0,  1,2,...,  25. 

10  T 

* (a)  Let  P and  Cbe  the  matrices  in  Theorem  10.15.5.  Show  that  P = C(A~^)  • 

(b)  To  prove  Theorem  10.15.5,  let  E\,  E2 , En  be  the  elementary  matrices  that  correspond  to  the  row  operations  that 
reduce  C to  /,  so 

En..E2E{C  = I 

Show  that 


Eyi..E2ElP=(A~l ) 

1 'T 

from  which  it  follows  that  the  same  sequence  of  row  operations  that  reduces  C to  I converts  P to  ( A ) . 


(a)  If  A is  the  enciphering  matrix  of  a Hill  ^/-cipher,  show  that 


A~l  = (C~1P)  (mod  26) 


where  C and  P are  the  matrices  defined  in  Theorem  10.15.5. 


(b)  Instead  of  using  Theorem  10.15.5  as  in  the  text,  find  the  deciphering  matrix  of  Example  8 by  using  the  result  in 
part  (a)  and  Equation  2 to  compute  C • [Note:  Although  this  method  is  practical  for  Hill  2-ciphers,  Theorem 
10.15.5  is  more  efficient  for  Hill  ^/-ciphers  with  ^ > 2-] 


Technology  Exercises 


The  following  exercises  are  designed  to  be  solved  using  a technology  utility.  Typically,  this  will  be  matlab,  Mathematica, 
Maple,  Derive,  or  Mathcad,  but  it  may  also  be  some  other  type  of  linear  algebra  software  or  a scientific  calculator  with 
some  linear  algebra  capabilities.  For  each  exercise  you  will  need  to  read  the  relevant  documentation  for  the  particular 
utility  you  are  using.  The  goal  of  these  exercises  is  to  provide  you  with  a basic  proficiency  with  your  technology  utility. 
Once  you  have  mastered  the  techniques  in  these  exercises,  you  will  be  able  to  use  your  technology  utility  to  solve  many  of 
the  problems  in  the  regular  exercise  sets. 


Tl.  Two  integers  that  have  no  common  factors  (except  1)  are  said  to  be  relatively  prime.  Given  a positive  integer  n , let 
Sn=  {a\,a2,ct2, , where  a \ < ct2  < <23  < - - - < <3m,  be  the  set  of  all  positive  integers  less  than  n and  relatively 
prime  to  n.  For  example,  if  ^ = 9,  then 

£9  = [a\,  ^2,  <33, a^}  = (1,2,  4,  5,  7,  8} 

(a)  Construct  a table  consisting  of  n and  Sn  for  n = 2,  3, 15,  and  then  compute 

m f m \ 

Y and  (mod«) 

fc=l  Vc=l  j 

in  each  case.  Draw  a conjecture  for  ^ > 15  and  prove  your  conjecture  to  be  true.  [Hint:  Use  the  fact  that  if  a is 
relatively  prime  to  n , then  « — a is  also  relatively  prime  to  n.] 


(b)  Given  a positive  integer  n and  the  set  let  Pn  be  the  yn  x m matrix 


so  that,  for  example, 
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Use  a computer  to  compute  det(.PM)  and  det(P„)  (mod  n)  for  n = 2,  3, 15,  and  then  use  these  results  to  construct  a 
conjecture. 


(c)  Use  the  results  of  part  (a)  to  prove  your  conjecture  to  be  true.  [Hint:  Add  the  first  m—\  rows  of  Pn  to  its  last  row  and 
then  use  Theorem  2.2.3.]  What  do  these  results  imply  about  the  inverse  of  P„(mod  «)? 


T2.  Given  a positive  integer  n greater  than  1 , the  number  of  positive  integers  less  than  n and  relatively  prime  to  n is  called 
the  Euler  phi  function  of  n and  is  denoted  by  p(n) . For  example,  ^(6)  = 2 since  only  two  positive  integers  (1  and  5)  are 
less  than  6 and  have  no  common  factor  with  6. 

(a)  Using  a computer,  for  each  value  of  n = 2,  3, 25  compute  and  print  out  all  positive  integers  that  are  less  than  n and 
relatively  prime  to  n.  Then  use  these  integers  to  determine  the  values  of  for  n = 2,  3, 25.  Can  you  discover  a 
pattern  in  the  results? 


(b)  It  can  be  shown  that  if  {p\9  P2>  P2>  •••*  Pm)  are  all  the  distinct  prime  factors  of  n,  then 
For  example,  since  (2,  3}  are  the  distinct  prime  factors  of  12,  we  have 

which  agrees  with  the  fact  that  { 1,  5,  7,  1 1 } are  the  only  positive  integers  less  than  12  and  relatively  prime  to  12. 
Using  a computer,  print  out  all  the  prime  factors  of  n for  n = 2,  3, 25.  Then  compute  \ p{n)  using  the  formula  above 
and  compare  it  to  your  results  in  part  (a). 
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10.16  Genetics 

In  this  section  we  investigate  the  propagation  of  an  inherited  trait  in  successive  generations  by  computing 
powers  of  a matrix. 


Prerequisites 

Eigenvalues  and  Eigenvectors 
Diagonalization  of  a Matrix 
Intuitive  Understanding  of  Limits 


Inheritance  Traits 

In  this  section  we  examine  the  inheritance  of  traits  in  animals  or  plants.  The  inherited  trait  under  consideration 
is  assumed  to  be  governed  by  a set  of  two  genes,  which  we  designate  by  A and  a.  Under  autosomal 
inheritance  each  individual  in  the  population  of  either  gender  possesses  two  of  these  genes,  the  possible 
pairings  being  designated  AA,  Aa,  and  aa.  This  pair  of  genes  is  called  the  individual's  genotype,  and  it 
determines  how  the  trait  controlled  by  the  genes  is  manifested  in  the  individual.  For  example,  in  snapdragons 
a set  of  two  genes  determines  the  color  of  the  flower.  Genotype  A A produces  red  flowers,  genotype  Aa 
produces  pink  flowers,  and  genotype  aa  produces  white  flowers.  In  humans,  eye  coloration  is  controlled 
through  autosomal  inheritance.  Genotypes  A A and  aa  have  brown  eyes,  and  genotype  Aa  has  blue  eyes.  In  this 
case  we  say  that  gene  A dominates  gene  a,  or  that  gene  a is  recessive  to  gene  A.  because  genotype  Aa  has  the 
same  outward  trait  as  genotype  AA. 

In  addition  to  autosomal  inheritance  we  will  also  discuss  X-linked  inheritance.  In  this  type  of  inheritance,  the 
male  of  the  species  possesses  only  one  of  the  two  possible  genes  (A  or  a),  and  the  female  possesses  a pair  of 
the  two  genes  (AA,  aa,  or  Aa).  In  humans,  color  blindness,  hereditary  baldness,  hemophilia,  and  muscular 
dystrophy,  to  name  a few,  are  traits  controlled  by  X-linked  inheritance. 

Below  we  explain  the  manner  in  which  the  genes  of  the  parents  are  passed  on  to  their  offspring  for  the  two 
types  of  inheritance.  We  construct  matrix  models  that  give  the  probable  genotypes  of  the  offspring  in  terms  of 
the  genotypes  of  the  parents,  and  we  use  these  matrix  models  to  follow  the  genotype  distribution  of  a 
population  through  successive  generations. 


Autosomal  Inheritance 

In  autosomal  inheritance  an  individual  inherits  one  gene  from  each  of  its  parents'  pairs  of  genes  to  form  its 
own  particular  pair.  As  far  as  we  know,  it  is  a matter  of  chance  which  of  the  two  genes  a parent  passes  on  to 
the  offspring.  Thus,  if  one  parent  is  of  genotype  Aa , it  is  equally  likely  that  the  offspring  will  inherit  the  A 


gene  or  the  a gene  from  that  parent.  If  one  parent  is  of  genotype  aa  and  the  other  parent  is  of  genotype  Aa , the 
offspring  will  always  receive  an  a gene  from  the  aa  parent  and  will  receive  either  an  A gene  or  an  a gene,  with 
equal  probability,  from  the  Aa  parent.  Consequently,  each  of  the  offspring  has  equal  probability  of  being 
genotype  aa  or  Aa.  In  Table  1 we  list  the  probabilities  of  the  possible  genotypes  of  the  offspring  for  all 
possible  combinations  of  the  genotypes  of  the  parents. 

Table  1 


Genotype 
of  Offspring 

Ge 

notypes  of  Parents 

AA-AA 

AA-Aa 

AA-aa 

Aa-Aa 

Aa-aa 

aa-aa 

AA 

i 
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Aa 

0 

l 
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i 

3 

\ 

0 

aa 

0 

0 

0 

l 

4 

l 
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EXAMPLE  1 Distribution  of  Genotypes  in  a Population 

Suppose  that  a farmer  has  a large  population  of  plants  consisting  of  some  distribution  of  all 
three  possible  genotypes  AA , Aa , and  aa.  The  farmer  desires  to  undertake  a breeding  program  in 
which  each  plant  in  the  population  is  always  fertilized  with  a plant  of  genotype  AA  and  is  then 
replaced  by  one  of  its  offspring.  We  want  to  derive  an  expression  for  the  distribution  of  the 
three  possible  genotypes  in  the  population  after  any  number  of  generations. 

For  n = 0,  1,  2, let  us  set 

an  = fraction  of  plants  of  genotype  AA  in  n th  generation 
bn  = fraction  of  plants  of  genotype  Aa  in  n th  generation 

cn  = fraction  of  plants  of  genotype  aa  in  n th  generation 

Thus  flQ?  &q,  and  c*o  specify  the  initial  distribution  of  the  genotypes.  We  also  have  that 

an  A-bn  +c„  = 1 for  n = 0,  1,  2, ... 

From  Table  1 we  can  determine  the  genotype  distribution  of  each  generation  from  the  genotype 
distribution  of  the  preceding  generation  by  the  following  equations: 

««  = an-\  + \bn-\ 

bn  = cn— 1 + ~2byi~  1 ^ = 1’ 2, ...  ^ 

cn  = 0 

For  example,  the  first  of  these  three  equations  states  that  all  the  offspring  of  a plant  of  genotype 
AA  will  be  of  genotype  AA  under  this  breeding  program  and  that  half  of  the  offspring  of  a plant 
of  genotype  Aa  will  be  of  genotype  AA. 


Equations  1 can  be  written  in  matrix  notation  as 


x(”>  = Mx(”_1),  n = 1,2,...  (2) 

where 

an  an— i 2 

x<”)  = , X1'”-^  = bn-\  , and  M=  ^ ^ 

cM_l  2 

[0  0 0 

Note  that  the  three  columns  of  the  matrix  M are  the  same  as  the  first  three  columns  of  Table  1 . 
From  Equation  2 it  follows  that 

x(”)  = Mx  (”-1)  = M2x^-^  = • • • = M”x®  (3) 


Consequently,  if  we  can  find  an  explicit  expression  for  M”,  we  can  use  3 to  obtain  an  explicit 
expression  for  X(M).  To  find  an  explicit  expression  for  M”,  we  first  diagonalize  M.  That  is,  we 
find  an  invertible  matrix  P and  a diagonal  matrix  D such  that 

M = PDP~l  (4) 


With  such  a diagonalization,  we  then  have  (see  Exercise  1) 

Mn  = PD”P~l  for n=  1,2,... 

where 

'A!  0 0 ...  0l”  k 0 0 ...  0 

Dn=  o a2  0 ...  0 _ 0 Aj  0 ...  0 

: : : : : : : : 

0 0 0 ...  Xk  ooo  ... 

The  diagonalization  of  Mis  accomplished  by  finding  its  eigenvalues  and  corresponding 
eigenvectors.  These  are  as  follows  (verify): 

Eigenvalues:  Aj  = 1,  A2  = A3  = 0 

'll  [1]  f 

Corresponding  eigenvectors:  vj  = 0 , V2  = — 1 , V3  = —2 

oj  [ °J  1 

Thus,  in  Equation  4 we  have 

'Ai  0 0 1 10  0 

D=  0 A2  0 = 0-^0 

A 2 

. 0 0 A3  J 0 0 0 


and 


p=  [V1IV2IV3]  = 


Using  the  fact  that  ciQ  + &q  -f  cq  = l,we  thus  have 


These  are  explicit  formulas  for  the  fractions  of  the  three  genotypes  in  the  nth  generation  of 
plants  in  terms  of  the  initial  genotype  fractions. 


Because 


n 

tends  to  zero  as  n approaches  infinity,  it  follows  from  these  equations  that 


ctn  \ 

bn  - 0 


as  n approaches  infinity.  That  is,  in  the  limit  all  plants  in  the  population  will  be  genotype  AA. 


EXAMPLE  2 Modifying  Example  1 


We  can  modify  Example  1 so  that  instead  of  each  plant  being  fertilized  with  one  of  genotype 
AA , each  plant  is  fertilized  with  a plant  of  its  own  genotype.  Using  the  same  notation  as  in 
Example  1,  we  then  find 

X(»)=M"X(P) 

where 

1 5 0 

M=  0^0 

0 5 1 

The  columns  of  this  new  matrix  Mare  the  same  as  the  columns  of  Table  1 corresponding  to 
parents  with  genotypes  AA-AA,  Aa-Aa , and  aa-aa. 


The  eigenvalues  of  M are  (verify) 


Ai  = 1,  A2  = 1,  A3  = 


The  eigenvalue  Aj  = 1 has  multiplicity  two  and  its  corresponding  eigenspace  is 
two-dimensional.  Picking  two  linearly  independent  eigenvectors  vi  and  V2  in  that  eigenspace, 
and  a single  eigenvector  V3  for  the  simple  eigenvalue  A3  = we  have  (verify) 

'11  [oi  r r 

Vi  = 0 , V2  = 0 , V3  = -2 

°J  M L 1 

The  calculations  for  XCM)  are  then 

xV>  = Myix(^)  = PDnP~{xi ® 


'1 

0 

1] 

0 

0 

-2 

0 

1 

lj 

1 0 0 
0 1 0 


0 i 1 *0 

1 c° 

0 4 0 


1 i-4 


a 0 

0 A0 


0 i-fi 


Thus, 


*0 


ctn 


aQ  + 


1 

2 


b 


n 


cn 


(6) 


In  the  limit,  as  n tends  to  infinity. 


0 and  ('.L'l 


h+1 


0,  so 


an  — ► + ^0 

bn  ► 0 
cn  -►  c0  + ^o 

Thus,  fertilization  of  each  plant  with  one  of  its  own  genotype  produces  a population  that  in  the 
limit  contains  only  genotypes  AA  and  aa. 


Autosomal  Recessive  Diseases 

There  are  many  genetic  diseases  governed  by  autosomal  inheritance  in  which  a normal  gene  A dominates  an 
abnormal  gene  a.  Genotype  AA  is  a normal  individual;  genotype  A a is  a carrier  of  the  disease  but  is  not 
afflicted  with  the  disease;  and  genotype  aa  is  afflicted  with  the  disease.  In  humans  such  genetic  diseases  are 
often  associated  with  a particular  racial  group — for  instance,  cystic  fibrosis  (predominant  among  Caucasians), 
sickle-cell  anemia  (predominant  among  people  of  African  origin),  Cooley's  anemia  (predominant  among 
people  of  Mediterranean  origin),  and  Tay-Sachs  disease  (predominant  among  Eastern  European  Jews). 

Suppose  that  an  animal  breeder  has  a population  of  animals  that  carries  an  autosomal  recessive  disease. 
Suppose  further  that  those  animals  afflicted  with  the  disease  do  not  survive  to  maturity.  One  possible  way  to 
control  such  a disease  is  for  the  breeder  to  always  mate  a female,  regardless  of  her  genotype,  with  a normal 
male.  In  this  way,  all  future  offspring  will  either  have  a normal  father  and  a normal  mother  (AA-AA  matings) 
or  a normal  father  and  a carrier  mother  ( AA-Aa  matings).  There  can  be  no  AA-aa  matings  since  animals  of 
genotype  aa  do  not  survive  to  maturity.  Under  this  type  of  mating  program  no  future  offspring  will  be 
afflicted  with  the  disease,  although  there  will  still  be  carriers  in  future  generations.  Let  us  now  determine  the 
fraction  of  carriers  in  future  generations.  We  set 

x(»)  = 

where 

an  = fraction  of  population  of  genotype  AA  in  n th  generation 
bn  = fraction  of  population  of  genotype  Aa  (earners)  in  n th  generation 
Because  each  offspring  has  at  least  one  normal  parent,  we  may  consider  the  controlled  mating  program  as  one 


an 

by\ 


, n = 1,  2, 


of  continual  mating  with  genotype  Aa , as  in  Example  1 . Thus,  the  transition  of  genotype  distributions  from 
one  generation  to  the  next  is  governed  by  the  equation 

x&>  = Mx C”"1),  ft  =1,2, ... 

where 


M = 


1 i 
0 1 


Because  we  know  the  initial  distribution  the  distribution  of  genotypes  in  the  nth  generation  is  thus  given 

by 

xV>  = M”x<®,  n=\,2, ... 

The  diagonalization  of  M is  easily  carried  out  (see  Exercise  4)  and  leads  to 
x(”)  = PD”P~lx(V)  = 


1 0 

'1  f 

/ - 

'i  f 

a o 

_° 
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— — » 

_°  -i_ 

^0 

1 ’-(?) 

«0  4-^0  “ J ^0 
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Because  ag  + &g  = 1 , we  have 


an  — 1 


bn  — 


-&r* 


Thus,  as  n tends  to  infinity,  we  have 


ay 


n = 1,  2, ... 


1 

0 


(7) 


so  in  the  limit  there  will  be  no  carriers  in  the  population. 
From  7 we  see  that 

1 


bn  — 1'  n — r 2, ... 


(8) 


That  is,  the  fraction  of  carriers  in  each  generation  is  one-half  the  fraction  of  carriers  in  the  preceding 
generation.  It  would  be  of  interest  also  to  investigate  the  propagation  of  carriers  under  random  mating,  when 
two  animals  mate  without  regard  to  their  genotypes.  Unfortunately,  such  random  mating  leads  to  nonlinear 
equations,  and  the  techniques  of  this  section  are  not  applicable.  However,  by  other  techniques  it  can  be  shown 
that  under  random  mating,  Equation  8 is  replaced  by 


bn  — 


bn-\ 


1 + 2^h-i 


n = 1,2,. 


(9) 


As  a numerical  example,  suppose  that  the  breeder  starts  with  a population  in  which  10%  of  the  animals  are 
carriers.  Under  the  controlled-mating  program  governed  by  Equation  8,  the  percentage  of  carriers  can  be 
reduced  to  5%  in  one  generation.  But  under  random  mating,  Equation  9 predicts  that  9.5%  of  the  population 
will  be  carriers  after  one  generation  (bn  = .095  if  bn-\  = . 10).  In  addition,  under  controlled  mating  no 
offspring  will  ever  be  afflicted  with  the  disease,  but  with  random  mating  it  can  be  shown  that  about  1 in  400 
offspring  will  be  bom  with  the  disease  when  10%  of  the  population  are  carriers. 


X-Linked  Inheritance 

As  mentioned  in  the  introduction,  in  X-linked  inheritance  the  male  possesses  one  gene  (A  or  a)  and  the  female 
possesses  two  genes  ( AA , Aa,  or  aa).  The  term  X-linked  is  used  because  such  genes  are  found  on  the 
X-chromosome,  of  which  the  male  has  one  and  the  female  has  two.  The  inheritance  of  such  genes  is  as 
follows:  A male  offspring  receives  one  of  his  mother's  two  genes  with  equal  probability,  and  a female 
offspring  receives  the  one  gene  of  her  father  and  one  of  her  mother's  two  genes  with  equal  probability. 

Readers  familiar  with  basic  probability  can  verify  that  this  type  of  inheritance  leads  to  the  genotype 
probabilities  in  Table  2. 

Table  2 


Genotypes  of  Parents  (Father,  Mother) 

(T.A4) 

(■4,4a) 
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Aa 
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0 
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0 
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We  will  discuss  a program  of  inbreeding  in  connection  with  X-linked  inheritance.  We  begin  with  a male  and 
female;  select  two  of  their  offspring  at  random,  one  of  each  gender,  and  mate  them;  select  two  of  the  resulting 
offspring  and  mate  them;  and  so  forth.  Such  inbreeding  is  commonly  performed  with  animals.  (Among 
humans,  such  brother-sister  marriages  were  used  by  the  rulers  of  ancient  Egypt  to  keep  the  royal  line  pure.) 

The  original  male-female  pair  can  be  one  of  the  six  types,  corresponding  to  the  six  columns  of  Table  2: 

(A,  AA),  (A,  Aa),  ( A,aa ),  (a,  AA),  ( a,  Aa ),  ( a,aa ) 

The  sibling  pairs  mated  in  each  successive  generation  have  certain  probabilities  of  being  one  of  these  six 
types.  To  compute  these  probabilities,  for  n = 0,  1,2,...,  let  us  set 


probability  sibling-pair  mated  in  n th  generation  is  type  {A,  AA) 
probability  sibling-pair  mated  in  n th  generation  is  type  (A,  Aa) 
probability  sibling-pair  mated  in  n th  generation  is  type  (A,  aa) 
probability  sibling-pair  mated  in  n th  generation  is  type  (a,  AA) 
probability  sibling-pair  mated  in  n th  generation  is  type  (a,  Aa) 
probability  sibling-pair  mated  in  n th  generation  is  type  (a,  aa) 


With  these  probabilities  we  form  a column  vector 


x(">  = 


* = 0,1,2, 


From  Table  2 it  follows  that 


where 


= n — 1,  2, ... 


(A,  AA)  (A,  Aa)  (A,  aa)  (a,  AA)  (a,  Aa)  (a,  aa) 

1 4-  0 0 0 0 

4 (A,  AA) 

0 4 ° 1 4 ° (AAa) 

M=  0 0 0 0 4 0 (A,aa) 

0 4 0 0 0 0 (fl.AA) 

4 

0 1 1 o 1 o (<*’■&*) 

o o o o I i (a-aa) 

4 

For  example,  suppose  that  in  the  («  — 1 ) -st  generation,  the  sibling  pair  mated  is  type  (A,  Aa).  Then  their 
male  offspring  will  be  genotype  A or  a with  equal  probability,  and  their  female  offspring  will  be  genotype  AA 
or  Aa  with  equal  probability.  Because  one  of  the  male  offspring  and  one  of  the  female  offspring  are  chosen  at 
random  for  mating,  the  next  sibling  pair  will  be  one  of  type  (. A , AA),  (A,  Aa),  (a,  AA),  or  (a,  Aa)  with 
equal  probability.  Thus,  the  second  column  of  M contains  “i”  in  each  of  the  four  rows  corresponding  to  these 
four  sibling  pairs.  (See  Exercise  9 for  the  remaining  columns.) 


As  in  our  previous  examples,  it  follows  from  10  that 


= «=  1,  2, 


(11) 


After  lengthy  calculations,  the  eigenvalues  and  eigenvectors  of  M turn  out  to  be 

>1  = 1.  >2=1.  >3=2’  >4  = — 2’  >J  = jO  + /5).  >6  = j(l-/5) 


The  diagonalization  of  M then  leads  to 

x(&  = PDnp-Xy®),  n = 1,2,...  (12) 


where 


p 


Dn 


P~l 


1 0 
0 0 
0 0 

0 0 
0 0 
0 1 


-1  1 i(-3-/5)  i(-3  + /S> 

2-6  1 1 

-1  -3  i(-l+/5)  ±(-1-/5) 

1 3 ±(-l  + /5)  ±(-1-/5) 

-2  6 1 1 

1 -1  ±(-3-/»  ±(-3  + /s) 


1 0 0 
0 1 0 


0 0 0 
0 0 0 
0 0 0 


0 0 
0 0 

0 0 


0 

0 

0 


0 


0 

[*o-r»r 


1 

0 

0 

0 

0 


2 

3 

1 

3 

1 

8 

24 

X 

20 


1 

3 

2 

3 

_I 

4 

12 


2 

3 

1 

3 

1 

4 

J_ 

12 


1 

3 

2 

3 

1 

8 

J_ 

24 


0 

1 

0 

0 


20 


(5  + /?)  0 


o ±0-f5)  -\f5  -±/5  £(5-/5)  o 


We  will  not  write  out  the  matrix  product  in  12,  as  it  is  rather  unwieldy.  However,  if  a specific  vector  x' IJ 1 is 
given,  the  calculation  for  x00  is  not  too  cumbersome  (see  Exercise  6). 


Because  the  absolute  values  of  the  last  four  diagonal  entries  of  D are  less  than  1 , we  see  that  as  n tends  to 
infinity, 


Dn 


1 0 0 0 0 0 
0 1 0 0 0 0 
0 0 0 0 0 0 
0 0 0 0 0 0 
0 0 0 0 0 0 
0 0 0 0 0 0 


And  so,  from  Equation  12 


x 


(«) 


P 


1 0 0 0 0 0 
0 1 0 0 0 0 
0 0 0 0 0 0 
0 0 0 0 0 0 
0 0 0 0 0 0 
0 0 0 0 0 0 


Performing  the  matrix  multiplication  on  the  right,  we  obtain  (verify) 

2,  , 1-  , 2 . , 1 


rC"). 


<^0  + + ~2C0  + + ~J&0 

0 

0 

0 

0 

f 0 + 0 + + ^e0 


(13) 


That  is,  in  the  limit  all  sibling  pairs  will  be  either  type  (A,  AA ) or  type  (a,  act) . For  example,  if  the  initial 
parents  are  type  (A,  Act)  (that  is,  = 1 and  aa  = cq  = = eo  = /o  = 0),  then  as  n tends  to  infinity, 


x 


(») 


2 

3 

0 

0 

0 

0 

1 

3 


2 1 
Thus,  in  the  limit  there  is  probability  that  the  sibling  pairs  will  be  (A,  AA) , and  probability  -y  that  they  will 

be  (a,  aa). 


Exercise  Set  10.16 

1.  Show  that  if  _ ppp~ 1,  then  — PDnP~^  for  n = 1,  2, 

2.  In  Example  1 suppose  that  the  plants  are  always  fertilized  with  a plant  of  genotype  Aa  rather  than  one  of 
genotype  AA.  Derive  formulas  for  the  fractions  of  the  plants  of  genotypes  AA,  Aa,  and  aa  in  the  nth 
generation.  Also,  find  the  limiting  genotype  distribution  as  n tends  to  infinity. 


Answer: 


1 /1\”+1 

an  = J + ( 2 J (^0  — co) 


^ 


n = \,2,...bn  = ± 


-nr' 


(«0-co) 


\ as  n 


oo 


c»^4 


3.  In  Example  1 suppose  that  the  initial  plants  are  fertilized  with  genotype  AA , the  first  generation  is 
fertilized  with  genotype  Aa , the  second  generation  is  fertilized  with  genotype  AA,  and  this  alternating 
pattern  of  fertilization  is  kept  up.  Find  formulas  for  the  fractions  of  the  plants  of  genotypes  AA,  Aa , and  aa 
in  the  «th  generation. 


Answer: 


} n = 0,1,  2,... 


2 1 

32m+1=o  + C/A,n  (2fl0-*0-4g0) 

5 6(4) 

*2w+l  = (2^0  “ *0  “ 4c7q) 

3 6(4) 

c2w+l  = 0 

= -p>-  + -^yr(2ao  - - 4co) 

*2h  = ^ ) « = 1,  2, ... 

C2”  = 12  “ $(4)m  (2a° ~b°~ 4co) 


4.  In  the  section  on  autosomal  recessive  diseases,  find  the  eigenvalues  and  eigenvectors  of  the  matrix  M and 
verify  Equation  7. 

Answer: 


Eigenvalues:  Aj  = 1,  A2  = ~k',  eigenvectors:  ei  = 


1 

0 


> e2  = 


1 

-1 


5.  Suppose  that  a breeder  has  an  animal  population  in  which  25%  of  the  population  are  carriers  of  an 
autosomal  recessive  disease.  If  the  breeder  allows  the  animals  to  mate  irrespective  of  their  genotype,  use 
Equation  9 to  calculate  the  number  of  generations  required  for  the  percentage  of  carriers  to  fall  from  25% 
to  10%.  If  the  breeder  instead  implements  the  controlled-mating  program  determined  by  Equation  8,  what 
will  the  percentage  of  carriers  be  after  the  same  number  of  generations? 


Answer: 


12  generations;  .006% 

6.  In  the  section  on  X-linked  inheritance,  suppose  that  the  initial  parents  are  equally  likely  to  be  of  any  of  the 
six  possible  genotype  parents;  that  is. 


I 

6 

I 

6 

1 

6 

1 

6 

I 

6 

1 

6 

Using  Equation  12,  calculate  and  also  calculate  the  limit  of  X(M)  as  n tends  to  infinity. 

Answer: 


7.  From  1 3 show  that  under  X-linked  inheritance  with  inbreeding,  the  probability  that  the  limiting  sibling 
pairs  will  be  of  type  (A,  AA)  is  the  same  as  the  proportion  of  A genes  in  the  initial  population. 

8.  In  X-linked  inheritance  suppose  that  none  of  the  females  of  genotype  Aa  survive  to  maturity.  Under 
inbreeding  the  possible  sibling  pairs  are  then 

(A,  AA),  ( A,aa ),  ( a,  AA ),  and  (a,aa) 

Find  the  transition  matrix  that  describes  how  the  genotype  distribution  changes  in  one  generation. 


Answer: 


10  0 0 
0 0 0 0 
0 0 0 0 
0 0 0 1 

9.  Derive  the  matrix  Min  Equation  10  from  Table  2. 

Technology  Exercises 

The  following  exercises  are  designed  to  be  solved  using  a technology  utility.  Typically,  this  will  be 
MATLAB,  Mathematica,  Maple,  Derive,  or  Mathcad,  but  it  may  also  be  some  other  type  of  linear  algebra 
software  or  a scientific  calculator  with  some  linear  algebra  capabilities.  For  each  exercise  you  will  need  to 
read  the  relevant  documentation  for  the  particular  utility  you  are  using.  The  goal  of  these  exercises  is  to 
provide  you  with  a basic  proficiency  with  your  technology  utility.  Once  you  have  mastered  the  techniques  in 
these  exercises,  you  will  be  able  to  use  your  technology  utility  to  solve  many  of  the  problems  in  the  regular 
exercise  sets. 


Tl. 


(a)  Use  a computer  to  verify  that  the  eigenvalues  and  eigenvectors  of 


1 

0 


0 


M = 


0 


0 


0 


1 

4 

1 

4 


0 0 
0 1 


0 0 0 
0 0 
1 0 
0 0 0 


0 

1 

4 

1 

4 

0 

1 

4 

1 

4 


0 

0 

0 

0 

0 

1 


as  given  in  the  text  are  correct. 

(b)  Starting  with  and  the  assumption  that 


lim  x(”>  = x 

Yl — »OQ 


exists,  we  must  have 

lim  x(”>  = M lim  x(”-1)  or  x = Mx 

Yl— K3G  Yl—*  QQ 


This  suggests  that  v can  be  solved  directly  using  the  equation  (M  ^ /)x  = 0.Usea  computer  to  solve  the 
equation  x = Mx,  where 


X = 


a 

b 
c 

d 
e 

f 

and  a I b + c + d +e  +•  / = 1 ; compare  your  results  to  Equation  13.  Explain  why  the  solution  to 
(M  — 7)x  = 0 along  with  a+b  + c + d + e + f = 1 is  not  specific  enough  to  determine  lim  x^. 

YI—+00 


T2. 


(a)  Given 


1 

0 

-1 

1 

-3-/5) 

0 

0 

2 

-6 

1 

0 

0 

-1 

-3 

4(' 

- 1 + / 5 ) 

0 

0 

1 

3 

*<■ 

- 1 + \[5) 

0 

0 

-2 

6 

1 

0 

1 

1 

-1 

-3-/5) 

^(-3  + /5) 

1 

1 

^(-3  + /5) 


from  Equation  12  and 


lim  Dn 

n—*oQ 


1 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

use  a computer  to  show  that 


lim  Mn 

Yl — >00 


1 

2 

1 

2 

1 

0 

3 

3 

3 

3 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

2 

1 

2 

1 

3 

3 

3 

3 

(b)  Use  a computer  to  calculate  Mn  for  n = 10,  20,  30,  40,  50,  60,  70,  and  then  compare  your  results  to  the 
limit  in  part  (a). 
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10.17  Age-Specific  Population  Growth 

In  this  section  we  investigate,  using  the  Leslie  matrix  model,  the  growth  over  time  of  a female  population  that 
is  divided  into  age  classes.  We  then  determine  the  limiting  age  distribution  and  growth  rate  of  the  population. 


Prerequisites 

Eigenvalues  and  Eigenvectors 
Diagonalization  of  a Matrix 
Intuitive  Understanding  of  Limits 


One  of  the  most  common  models  of  population  growth  used  by  demographers  is  the  so-called  Leslie  model 
developed  in  the  1940s.  This  model  describes  the  growth  of  the  female  portion  of  a human  or  animal 
population.  In  this  model  the  females  are  divided  into  age  classes  of  equal  duration.  To  be  specific,  suppose 
that  the  maximum  age  attained  by  any  female  in  the  population  is  L years  (or  some  other  time  unit)  and  we 
divide  the  population  into  n age  classes.  Then  each  class  is  Lin  years  in  duration.  We  label  the  age  classes 
according  to  Table  1 . 

Table  1 


Age  Class 

Age  Interval 

i 

(0.  L/n) 

[ l(n , 2 L/  n) 

3 

[2 Lf  it,  3 L/  n) 

n-  1 

1(m  - 2)Lf  n,  (/»-  1 )L/  n) 

n 

[{n-l)L/n.  L] 

Suppose  that  we  know  the  number  of  females  in  each  of  the  n classes  at  time  £ = Q.  In  particular,  let  there  be 
jr®  females  in  the  first  class,  females  in  the  second  class,  and  so  forth.  With  these  n numbers  we  form  a 

column  vector: 


*1 

(0) 

*2 


We  call  this  vector  the  initial  age  distribution  vector. 


As  time  progresses,  the  number  of  females  within  each  of  the  n classes  changes  because  of  three  biological 
processes:  birth,  death,  and  aging.  By  describing  these  three  processes  quantitatively,  we  will  see  how  to 
project  the  initial  age  distribution  vector  into  the  future. 


The  easiest  way  to  study  the  aging  process  is  to  observe  the  population  at  discrete  times — say, 

Iq,  l 1 > ^2>  - • -»  £k>  - - The  Leslie  model  requires  that  the  duration  between  any  two  successive  observation 
times  be  the  same  as  the  duration  of  the  age  intervals.  Therefore,  we  set 

^0  = 0 
t\  — Lin 
t2  — 2 Lin 

tfc  = kL  i n 

With  this  assumption,  all  females  in  the  (j  + 1 ) -st  class  at  time  were  in  the  zth  class  at  time  t^. 


The  birth  and  death  processes  between  two  successive  observation  times  can  be  described  by  means  of  the 
following  demographic  parameters: 


a, 

(»=  1.2 n) 

The  av  erage  number  of  daughters 
bom  to  each  female  during  the 
time  she  is  in  the  zth  age  class 

bi 

(/=  1.2 n-  1) 

The  fraction  of  females  in  the  ith 
age  class  that  can  be  expected  to 
surv  iv  e and  pass  into  the  (/  +1  )-st 
age  class 

By  their  definitions,  we  have  that 

(i) (3i  >0  for  i = 1,  2, n 

(ii) 0<i,<l  fori=l,2 n — 1 

Note  that  we  do  not  allow  any  bl  to  equal  zero,  because  then  no  females  would  survive  beyond  the  zth  age 
class.  We  also  assume  that  at  least  one  fly  is  positive  so  that  some  births  occur.  Any  age  class  for  which  the 
corresponding  value  of  taq  is  positive  is  called  a fertile  age  class. 


We  next  define  the  age  distribution  vector  x(*0  at  time  by 


x«  = 


(*) 

*1 

(*) 

*2 


where  is  the  number  of  females  in  the  zth  age  class  at  time  t^.  Now,  at  time  the  females  in  the  first  age 
class  are  just  those  daughters  bom  between  times  t^-i  and  t^.  Thus,  we  can  write 


(number  of'l 
females 
in  class  1 
at  time 


' number  of  ' 

f number  of  ' 

f number  of  ' 

daughters 

daughters 

daughters 

bom  to 

bom  to 

bom  to 

females  in 

) + < 

females  in 

) + — + < 

females  in  } 

class  1 

class 2 

class  n 

between  times 

between  times 

between  times 

tk- 1 and** 

\ / 

tk- 1 and  tk 

/ 

*ft-l  and  tk 

/ 

or,  mathematically, 


(ft)  _ (ft-1)  (ft-1)  (*-1) 

JTj  — <21*1  + <22*2  •F...  + ayiXn 


(1) 


The  females  in  the  (i  + 1 ) -st  age  class  (i  = 1,  2, — 1)  at  time  are  those  females  in  the  zth  class  at 
time  i who  are  still  alive  at  time  Thus, 


dumber  of  'l 
females  in 


class  i 4-  1 
at  time  1% 


= 


^ fraction  of  ^ 
females  in 
class  i 
who  survive 
and  pass  into 

class  i + 1 


{ number  of  'l 
females  in 
class  i 
at  time 


or,  mathematically, 


(ft)  (ft-1) 

xJ+l=£>iX,-  , J = l,2 n-  1 


Using  matrix  notation,  we  can  write  Equations  1 and  2 as 


or  more  compactly  as 


' (ft)' 

*1 

ci-  j ^2  <23  . . . 

■ (ft-i)' 

*1 

to 

& 

*i  0 0 ...  0 0 

(ft-i) 

*2 

(ft) 

= 

0 b2  0 ...  0 0 

(ft-i) 

*3 

: : : : : 

7:3 

(ft) 

O 

7 

* 

0 

0 

0 

1 

1 

* 

w 

1 

x(A)  = Zx(ft-i);  k=h2,... 


(2) 


(3) 


where  L is  the  Leslie  matrix 


L = 


a i a 2 a 3 
b\  0 0 

0 b2  0 


an— 1 an 
0 0 
0 0 


0 0 0...  bn- 1 0 


(4) 


From  Equation  3 it  follows  that 


x®  = £x® 

= ix^  = Z2x© 

x©  = Zx®  = Z3x® 

x<®  = Lx^  = Lkx^ 

Thus,  if  we  know  the  initial  age  distribution  x' IJ 1 and  the  Leslie  matrix  L,  we  can  determine  the  female 
distribution  at  any  later  time. 

EXAMPLE  1 Female  Age  Distribution  for  Animals 


Suppose  that  the  oldest  age  attained  by  the  females  in  a certain  animal  population  is  15  years 
and  we  divide  the  population  into  three  age  classes  with  equal  durations  of  five  years.  Let  the 
Leslie  matrix  for  this  population  be 


L = 


0 

1 

2 

0 


4 3 
0 0 


If  there  are  initially  1 000  females  in  each  of  the  three  age  classes,  then  from  Equation  3 we 
have 


x 


(P) 


1,000 

1,000 

1,000 


'o 

4 

3' 

1 

2 

0 

0 

'1,000' 

'7, 000' 

X® 

= Ix®  = 

1,000 

= 

500 

0 

1 

4 

0 

1,000 

250 

'o 

4 

3 

1 

2 

0 

0 

'7, 000' 

'2,750' 

x® 

= = 

500 

= 

3,  500 

0 

1 

4 

0 

250 

125 

'o 

4 

3' 

1 

2 

0 

0 

'2,750' 

14, 375 

x® 

= Zx®  = 

3,  500 

= 

1,  375 

0 

1 

0 

125 

875 

4 

Thus,  after  15  years  there  are  14,375  females  between  0 and  5 years  of  age,  1375  females 
between  5 and  10  years  of  age,  and  875  females  between  10  and  15  years  of  age. 


Limiting  Behavior 


Although  Equation  5 gives  the  age  distribution  of  the  population  at  any  time,  it  does  not  immediately  give  a 
general  picture  of  the  dynamics  of  the  growth  process.  For  this  we  need  to  investigate  the  eigenvalues  and 
eigenvectors  of  the  Leslie  matrix.  The  eigenvalues  of  L are  the  roots  of  its  characteristic  polynomial.  As  we 
ask  you  to  verify  in  Exercise  2,  this  characteristic  polynomial  is 

P(  A)  = \XI-L\ 

= A”  - a iA”_1  - a2b\\n~2  - fl3i>i62A”_3  - ...  - anb\b2..bn-\ 

To  analyze  the  roots  of  this  polynomial,  it  will  be  convenient  to  introduce  the  function 


,m-«l  I ^1  | . | anbib2. . i 

A A2  A3  A" 


(6) 


Using  this  function,  the  characteristic  equation  p (A)  = 0 can  be  written  (verify) 


?(A)  = 1 for  A * 0 


(7) 


Because  all  the  cJq  and  bj  are  nonnegative,  we  see  that  q (A)  is  monotonically  decreasing  for  A greater  than 
zero.  Furthermore,  q (A)  has  a vertical  asymptote  at  A = 0 and  approaches  zero  as  A — ► oo-  Consequently,  as 
Figure  10.17.1  indicates,  there  is  a unique  A,  say  A = Aj , such  that  ^(Aj)  = 1 . That  is,  the  matrix  L has  a 
unique  positive  eigenvalue.  It  can  also  be  shown  (see  Exercise  3)  that  A]  has  multiplicity  1 ; that  is,  \\  is  not  a 
repeated  root  of  the  characteristic  equation.  Although  we  omit  the  computational  details,  you  can  verify  that 
an  eigenvector  corresponding  to  Aj  is 

1 

b i/Ai 
b{b2i \j 

xi  = o (8' 

61&263/AJ 

b 1 

Because  Ai  has  multiplicity  1,  its  corresponding  eigenspace  has  dimension  1 (Exercise  3),  and  so  any 
eigenvector  corresponding  to  it  is  some  multiple  of  xj . We  can  summarize  these  results  in  the  following 
theorem. 


Figure  10.17.1 


Existence  of  a Positive  Eigenvalue 

A Leslie  matrix  L has  a unique  positive  eigenvalue  Aj.  This  eigenvalue  has  multiplicity  1 and  an 
eigenvector  xi  all  of  whose  entries  are  positive. 


We  will  now  show  that  the  long-term  behavior  of  the  age  distribution  of  the  population  is  determined  by  the 
positive  eigenvalue  Aj  and  its  eigenvector  xj . In  Exercise  9 we  ask  you  to  prove  the  following  result. 


Eigenvalues  of  a Leslie  Matrix 

If  Aj  is  the  unique  positive  eigenvalue  of  a Leslie  matrix  L,  and  A^  is  any  other  real  or  complex 
eigenvalue  of  L,  then  |A^  | < Aj . 


For  our  purposes  the  conclusion  in  Theorem  10.17.2  is  not  strong  enough;  we  need  Aj  to  satisfy  |A^|  < Aj.  In 
this  case  Ai  would  be  called  the  dominant  eigenvalue  of  L.  However,  as  the  following  example  shows,  not  all 
Leslie  matrices  satisfy  this  condition. 

EXAMPLE  2 Leslie  Matrix  with  No  Dominant  Eigenvalue 


Let 


0 6 
0 0 


Then  the  characteristic  polynomial  of  L is 

P(  A)  = 


XI -L 


= A — 1 


The  eigenvalues  of  L are  thus  the  solutions  of  A3  = 1 — namely, 

A=l, 


2 2’ 


1 

2 


All  three  eigenvalues  have  absolute  value  1,  so  the  unique  positive  eigenvalue  Aj  = 1 is  not 
dominant.  Note  that  this  matrix  has  the  property  that  £ = /.  This  means  that  for  any  choice  of  the 
initial  age  distribution  X(Q),  we  have 

X©=XC3)  = X(6)=...  = X(3A)  = 

The  age  distribution  vector  thus  oscillates  with  a period  of  three  time  units.  Such  oscillations  (or 


population  waves , as  they  are  called)  could  not  occur  if  Aj  were  dominant,  as  we  will  see  below. 


It  is  beyond  the  scope  of  this  book  to  discuss  necessary  and  sufficient  conditions  for  X\  to  be  a dominant 
eigenvalue.  However,  we  will  state  the  following  sufficient  condition  without  proof. 


Dominant  Eigenvalue 

If  two  successive  entries  and  +1  in  the  first  row  of  a Leslie  matrix  L are  nonzero,  then  the 
positive  eigenvalue  of  L is  dominant. 


Thus,  if  the  female  population  has  two  successive  fertile  age  classes,  then  its  Leslie  matrix  has  a dominant 
eigenvalue.  This  is  always  the  case  for  realistic  populations  if  the  duration  of  the  age  classes  is  sufficiently 
small.  Note  that  in  Example  2 there  is  only  one  fertile  age  class  (the  third),  so  the  condition  of  Theorem 
10.17.3  is  not  satisfied.  In  what  follows,  we  always  assume  that  the  condition  of  Theorem  10.17.3  is  satisfied. 


Let  us  assume  that  L is  diagonalizable.  This  is  not  really  necessary  for  the  conclusions  we  will  draw,  but  it 
does  simplify  the  arguments.  In  this  case,  L has  n eigenvalues,  Ai,  A2, A not  necessarily  distinct,  and  n 
linearly  independent  eigenvectors,  x\,  X2, xM,  corresponding  to  them.  In  this  listing  we  place  the  dominant 
eigenvalue  Ai  first.  We  construct  a matrix  P whose  columns  are  the  eigenvectors  of  L\ 

P=  [x1|x2|x3|...|x„] 


The  diagonalization  of  L is  then  given  by  the  equation 


L = P 


Ai  0 0 

0 A2  0 


0 

0 


0 0 0 ...  A„ 


>-l 


From  this  it  follows  that 


Lk  = P 


Af  0 0 

0 aJ  0 


0 

0 


A* 


>-l 


for  k=  1,  2, . 


0 0 0 

For  any  initial  age  distribution  vector  we  then  have 


Lkx(®  = P 


'1 


0 0 


0 Aj  0 


0 

0 


0 0 0 ...  Aj 


P-'x® 


for  k = 1,  2, ....  Dividing  both  sides  of  this  equation  by  \*  and  using  the  fact  that  we  have 


-U<*W 


o o ... 

k 

0 ... 


(it) 


0 0 0 ... 


0 

0 

it)' 


P-1XV) 


(9) 


Because  Aj  is  the  dominant  eigenvalue,  we  have  |A,  / Aj  | < 1 for  i = 2,  3, ....  n.  It  follows  that 

(A,  / Ai)*  — » 0 as  £ — »oo  fori  = 2,  3 n 

Using  this  fact,  we  can  take  the  limit  of  both  sides  of  9 to  obtain 

10  0 ...  O' 


lim 


-U(k)  - 


= P 


Af 


0 0 0 


0 


P-1X(P) 


(10) 


0 0 0 ...  0 

Let  us  denote  the  first  entry  of  the  column  vector  p— lxdb  by  the  constant  c.  As  we  ask  you  to  show  in 
Exercise  4,  the  right  side  of  10  can  be  written  as  cxj,  where  c is  a positive  constant  that  depends  only  on  the 
initial  age  distribution  vector  X|  UL  Thus,  10  becomes 


lim 


-U(k)  - 


AT 


Xs  ' V =CXJ 


(11) 


Equation  1 1 gives  us  the  approximation 


^)*cAf 


xi 


(12) 


for  large  values  of  k.  From  12  we  also  have 

x^-cAf^xi  (13) 

Comparing  Equations  12  and  13,  we  see  that 

(14) 

for  large  values  of  k.  This  means  that  for  large  values  of  time,  each  age  distribution  vector  is  a scalar  multiple 
of  the  preceding  age  distribution  vector,  the  scalar  being  the  positive  eigenvalue  of  the  Leslie  matrix. 
Consequently,  the  proportion  of  females  in  each  of  the  age  classes  becomes  constant.  As  we  will  see  in  the 
following  example,  these  limiting  proportions  can  be  determined  from  the  eigenvector  xj . 

EXAMPLE  3 Example  1 Revisited 


The  Leslie  matrix  in  Example  1 was 


0 4 3 


xi  = *1/A1 

b ib2/xj 


1 


1 

1 

2. 

3 

2 


1 

1 

3 

j_ 

18 


From  14  we  have 


for  large  values  of  k.  Hence,  every  five  years  the  number  of  females  in  each  of  the  three  classes 
will  increase  by  about  50%,  as  will  the  total  number  of  females  in  the  population. 

From  12  we  have 


Consequently,  eventually  the  females  will  be  distributed  among  the  three  age  classes  in  the  ratios 


females  in  the  second  age  class,  and  4%  of  the  females  in  the  third  age  class. 


EXAMPLE  4 Female  Age  Distribution  for  Humans 

In  this  example  we  use  birth  and  death  parameters  from  the  year  1965  for  Canadian  females. 
Because  few  women  over  50  years  of  age  bear  children,  we  restrict  ourselves  to  the  portion  of  the 
female  population  between  0 and  50  years  of  age.  The  data  are  for  5-year  age  classes,  so  there  are 
total  of  10  age  classes.  Rather  than  writing  out  the  10  x 10  Leslie  matrix  in  full,  we  list  the  birth 
and  death  parameters  as  follows: 


Age  Interval 

[0, 5) 

0.00000 

0.99651 

[5.  10) 

0.00024 

0.99820 

(10.15) 

0.05861 

0.99802 

(15,20) 

0.28608 

0.99729 

[20,  25) 

0.44791 

0.99694 

(25. 30) 

0.36399 

0.99621 

[30, 35) 

0.22259 

0.99460 

135.40) 

0.10457 

0.99184 

(40, 45) 

0.02826 

0.98700 

(45.  50) 

0.00240 

— 

Using  numerical  techniques,  we  can  approximate  the  positive  eigenvalue  and  corresponding 
eigenvector  by 


Ai  = 1.07622 


and 


1.00000 

0.92594 

0.85881 

0.79641 

0.73800 

0.68364 

0.63281 

0.58482 

0.53897 

0.49429 


Thus,  if  Canadian  women  continued  to  reproduce  and  die  as  they  did  in  1965,  eventually  every  5 
years  their  numbers  would  increase  by  7.622%.  From  the  eigenvector  xj,  we  see  that,  in  the  limit, 
for  every  100,000  females  between  0 and  5 years  of  age,  there  will  be  92,594  females  between  5 
and  10  years  of  age,  85,881  females  between  10  and  15  years  of  age,  and  so  forth. 


Let  us  look  again  at  Equation  12,  which  gives  the  age  distribution  vector  of  the  population  for  large  times: 

(15) 

Three  cases  arise  according  to  the  value  of  the  positive  eigenvalue  Aj : 

(i)  The  population  is  eventually  increasing  if  Aj  > 1 . 

(ii)  The  population  is  eventually  decreasing  if  Aj  < 1 . 

(iii)  The  population  eventually  stabilizes  if  Aj  = 1 . 

The  case  Aj  = 1 is  particularly  interesting  because  it  determines  a population  that  has  zero  population 
growth.  For  any  initial  age  distribution,  the  population  approaches  a limiting  age  distribution  that  is  some 
multiple  of  the  eigenvector  xi . From  Equations  6 and  7,  we  see  that  Aj  = 1 is  an  eigenvalue  if  and  only  if 


a i +<*2&i  +<23^1^2  + -..  + <2^l^2  = 1 


(16) 


The  expression 


R = ai  +«2^1  +«3&i^2  + -..  + <3«^l^2  ■■  bn- 1 (17) 

is  called  the  reproduction  rate  of  the  population.  (See  Exercise  5 for  a demographic  interpretation  of  R.) 
Thus,  we  can  say  that  a population  has  zero  population  growth  if  and  only  if  its  net  reproduction  rate  is  1 . 


Exercise  Set  10.17 


1.  Suppose  that  a certain  animal  population  is  divided  into  two  age  classes  and  has  a Leslie  matrix 


(a)  Calculate  the  positive  eigenvalue  Aj  of  L and  the  corresponding  eigenvector  xp 

(b)  Beginning  with  the  initial  age  distribution  vector 


100 

0 


calculate  xd,  and  x^,  rounding  off  to  the  nearest  integer  when  necessary. 


(c)  Calculate  x' f:' 1 using  the  exact  formula  x®  = Lx'-'’1  and  using  the  approximation  formula  ~ Ajx^ 


Answer: 


(a) 


*1  = 2-  xi  = 


(b)  x(l)  = 

100' 

, x®  = 

175' 

, x®  = 

"250" 

, x^  = 

"382" 

. *®  = 

"570" 

50 

50 

88 

125 

191 

(C)  xV)  = Lx(.S)  = 


857 

285 


, x(6)-Aix(5)  = 


855 

287 


2.  Find  the  characteristic  polynomial  of  a general  Leslie  matrix  given  by  Equation  4. 

3*  (a)  Show  that  the  positive  eigenvalue  Aj  of  a Leslie  matrix  is  always  simple.  Recall  that  a root  Aq  of  a 
polynomial  #(A)  is  simple  if  and  only  if  q ' (Aq)  * 0. 

(b)  Show  that  the  eigenspace  corresponding  to  Aj  has  dimension  1 . 

4.  Show  that  the  right  side  of  Equation  10  is  cx\,  where  c is  the  first  entry  of  the  column  vector  P_1x®- 

5.  Show  that  the  net  reproduction  rate  R,  defined  by  1 7,  can  be  interpreted  as  the  average  number  of 
daughters  born  to  a single  female  during  her  expected  lifetime. 


6.  Show  that  a population  is  eventually  decreasing  if  and  only  if  its  net  reproduction  rate  is  less  than  1 . 
Similarly,  show  that  a population  is  eventually  increasing  if  and  only  if  its  net  reproduction  rate  is  greater 
than  1. 

7.  Calculate  the  net  reproduction  rate  of  the  animal  population  in  Example  1 . 

Answer: 

2.375 

8.  (For  readers  with  a hand  calculator)  Calculate  the  net  reproduction  rate  of  the  Canadian  female 
population  in  Example  4. 

Answer: 

1.49611 

9.  (For  readers  who  have  read  Section  10.1-Section  10.3)  Prove  Theorem  10. 17.2.  [Hint:  Write  = re}&, 
substitute  into  7,  take  the  real  parts  of  both  sides,  and  show  that  r < Aj . 

Technology  Exercises 


The  following  exercises  are  designed  to  be  solved  using  a technology  utility.  Typically,  this  will  be 
MATLAB,  Mathematical  Maple,  Derive,  or  Mathcad,  but  it  may  also  be  some  other  type  of  linear  algebra 
software  or  a scientific  calculator  with  some  linear  algebra  capabilities.  For  each  exercise  you  will  need  to 
read  the  relevant  documentation  for  the  particular  utility  you  are  using.  The  goal  of  these  exercises  is  to 
provide  you  with  a basic  proficiency  with  your  technology  utility.  Once  you  have  mastered  the  techniques  in 
these  exercises,  you  will  be  able  to  use  your  technology  utility  to  solve  many  of  the  problems  in  the  regular 
exercise  sets. 


Tl.  Consider  the  sequence  of  Leslie  matrices 

0 a 
Ll~  bi  0 


L 4 = 


L 3 = 


0 0 a 

bi  0 0 

0 b2  0 


’0 

0 

0 

0 

a 

'0 

0 

0 

a 

b 1 

0 

0 

0 

0 

b 1 

0 

0 

0 

b 2 

0 

0 

0 

0 

b 2 

0 

0 

II 

0 

0 

0 

b 3 

0 

0 

0 

0 

b 3 

0 

0 

0 

0 

b 4 

0 

(a)  Use  a computer  to  show  that 

L\  = h,  L\  = h,  = 1 4,  L\=l5,... 

for  a suitable  choice  of  a in  terms  of  b\,  bj,  ....  bn—\. 

(b)  From  your  results  in  part  (a),  conjecture  a relationship  between  a and  b\,  £>2,  --•>  bn- 1 
L»  = /„,  where 


that  will  make 


L 


n — 


000...  0 
bi  0 0 ...  0 

0 b2  0 ...  0 

0 0 63  ...  0 

0 0 0...  bn- 1 


a 

0 

0 

0 

0 


(c)  Determine  an  expression  for  pn( A)  = |A/M  — Ln | and  use  it  to  show  that  all  eigenvalues  of  Ln  satisfy 
|A|  = 1 when  a and  b\,  62,  --->  bn-\  are  related  by  the  equation  determined  in  part  (b). 


T2.  Consider  the  sequence  of  Leslie  matrices 


a ap  ap 2 

Z3=  b 0 0 ’ 

0 b 0 


a ap  ap2 

L4=  b 0 0 

0 6 0 

0 0 6 


2 3 4 

a ap  ap  ap  ap 

b 0 0 0 0 

0 6 0 0 O'- 

0 0 6 0 0 

0 0 0 6 0 

a ap  ap2  ...  apn~2  apn~ 1 

6 0 0 ...  0 0 

L»=  0 6 0 ...  0 0 

0 0 6 ...  0 0 

0 0 0 ...  6 0 

where  0<_p<l,0<6<l,  and  \ < a. 

(a)  Choose  a value  for  n (say,  ^ = 8).  For  various  values  of  a,  b,  and  p,  use  a computer  to  determine  the 
dominant  eigenvalue  of  Ln,  and  then  compare  your  results  to  the  value  of  a + bp. 

(b)  Show  that 

w r v*  (\n-{bp)”\ 

pn{\)  = XI n — Ln  = A -a\ — 

which  means  that  the  eigenvalues  of  Ln  must  satisfy 

A”+1  - ( a + bp)\n  + a(bp)”  = 0 


(c)  Can  you  now  provide  a rough  proof  to  explain  the  fact  that  q wad-  bp? 

T3.  Suppose  that  a population  of  mice  has  a Leslie  matrix  L over  a 1 -month  period  and  an  initial  age 


distribution  vector  x' u 1 given  by 


0 

4 

5 

0 


L = 


0 


0 


0 


_9 

10 


0 

0 

0 


4 

5 


0 -k  I 7TZ  0 


_9 

10 


0 ? 
0 0 


1 

10 


0 0 0 0 0 


0 0 0 0 


-rr  0 0 0 


0 0 
0 


3_ 

10 


and 


50 

40 

30 

20 

10 

5 


(a)  Compute  the  net  reproduction  rate  of  the  population. 

(b)  Compute  the  age  distribution  vector  after  100  months  and  101  months,  and  show  that  the  vector  after  101 
weeks  is  approximately  a scalar  multiple  of  the  vector  after  100  months. 

(c)  Compute  the  dominant  eigenvalue  of  L and  its  corresponding  eigenvector.  How  are  they  related  to  your 
results  in  part  (b)? 

(d)  Suppose  you  wish  to  control  the  mouse  population  by  feeding  it  a substance  that  decreases  its  age-specific 
birthrates  (the  entries  in  the  first  row  of  L)  by  a constant  fraction.  What  range  of  fractions  would  cause  the 
population  eventually  to  decrease? 
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10.18  Harvesting  of  Animal  Populations 

In  this  section  we  employ  the  Leslie  matrix  model  of  population  growth  to  model  the  sustainable  harvesting 
of  an  animal  population.  We  also  examine  the  effect  of  harvesting  different  fractions  of  different  age  groups. 


Prerequisites 

Age-Specific  Population  Growth  (Section  10.17) 


Harvesting 

In  Section  10.17  we  used  the  Leslie  matrix  model  to  examine  the  growth  of  a female  population  that  was 
divided  into  discrete  age  classes.  In  this  section,  we  investigate  the  effects  of  harvesting  an  animal  population 
growing  according  to  such  a model.  By  harvesting  we  mean  the  removal  of  animals  from  the  population. 
(The  word  harvesting  is  not  necessarily  a euphemism  for  “slaughtering”;  the  animals  may  be  removed  from 
the  population  for  other  purposes.) 

In  this  section  we  restrict  ourselves  to  sustainable  harvesting  policies.  By  this  we  mean  the  following: 


DEFINITION  1 

A harvesting  policy  in  which  an  animal  population  is  periodically  harvested  is  said  to  be  sustainable 
if  the  yield  of  each  harvest  is  the  same  and  the  age  distribution  of  the  population  remaining  after  each 
harvest  is  the  same. 


Thus,  the  animal  population  is  not  depleted  by  a sustainable  harvesting  policy;  only  the  excess  growth  is 
removed. 

As  in  Section  10. 17,  we  will  discuss  only  the  females  of  the  population.  If  the  number  of  males  in  each  age 
class  is  equal  to  the  number  of  females — a reasonable  assumption  for  many  populations — then  our  harvesting 
policies  will  also  apply  to  the  male  portion  of  the  population. 


The  Harvesting  Model 

Figure  10.18.1  illustrates  the  basic  idea  of  the  model.  We  begin  with  a population  having  a particular  age 
distribution.  It  undergoes  a growth  period  that  will  be  described  by  the  Leslie  matrix.  At  the  end  of  the  growth 
period,  a certain  fraction  of  each  age  class  is  harvested  in  such  a way  that  the  unharvested  population  has  the 


same  age  distribution  as  the  original  population.  This  cycle  repeats  after  each  harvest  so  that  the  yield  is 
sustainable.  The  duration  of  the  harvest  is  assumed  to  be  short  in  comparison  with  the  growth  period  so  that 
any  growth  or  change  in  the  population  during  the  harvest  period  can  be  neglected. 


Population  before  growth  period 

Population  after  growth  period 

Growth 

Not  harvested 


Population 

harvested 


A 


Harvested 

V — 


Figure  10.18.1 

To  describe  this  harvesting  model  mathematically,  let 

"*r 

*2 

x=  . 

Xn 

be  the  age  distribution  vector  of  the  population  at  the  beginning  of  the  growth  period.  Thus  is  the  number 
of  females  in  the  ith  class  left  unharvested.  As  in  Section  10.17,  we  require  that  the  duration  of  each  age  class 
be  identical  with  the  duration  of  the  growth  period.  For  example,  if  the  population  is  harvested  once  a year, 
then  the  population  is  divided  into  1-year  age  classes. 


If  L is  the  Leslie  matrix  describing  the  growth  of  the  population,  then  the  vector  £,x  is  the  age  distribution 
vector  of  the  population  at  the  end  of  the  growth  period,  immediately  before  the  periodic  harvest.  Let  kj,  for 
i = 1,  2, be  the  fraction  of  females  from  the  /th  class  that  is  harvested.  We  use  these  n numbers  to  form 
an  n x n diagonal  matrix 


*1 


H = 


0 

0 


0 0 ...  0 

0 ...  0 

0 h'l  ...  0 


0 0 0...  hn 


which  we  will  call  the  harvesting  matrix.  By  definition,  we  have 

0<A,<1  (i  = 1,  2, ...,  n) 

That  is,  we  can  harvest  none  (, h = 0),  all  (, h 2 = 1),  or  some  fraction  (0  < h\  < 1)  of  each  of  the  n classes. 
Because  the  number  of  females  in  the  z'th  class  immediately  before  each  harvest  is  the  /th  entry  (Lx)  2-  of  the 
vector  the  /th  entry  of  the  column  vector 


HLx  = 


h\(Lx)l 
h2(Lx)2 

hn(Lx)» 

is  the  number  of  females  harvested  from  the  zth  class. 

From  the  definition  of  a sustainable  harvesting  policy,  we  have 


age  distnbution 

age  distribution 

at  end  of 

— [harvest]  = 

at  beginning  of 

growth  period 

growth  period 

or,  mathematically, 


Lx  — HLx  = x 


(1) 


If  we  write  Equation  1 in  the  form 


(I  — H')Lx  = x 


(2) 


we  see  that  x must  be  an  eigenvector  of  the  matrix  (/  — H)L  corresponding  to  the  eigen-  value  1.  As  we  will 
now  show,  this  places  certain  restrictions  on  the  values  of  h,  and  x. 


Suppose  that  the  Leslie  matrix  of  the  population  is 

ct\  a2  a3 
b\  0 0 

0 i2  0 

0 0 0 


L = 


...  <3^  — 1 Ctn 

...  0 0 

...  0 0 


bn—\  0 

L 

Then  the  matrix  (I  — H)L  is  (verify) 

(l-&l)ai  (\-h\)a2  (1  — ...  (1-Ai)a„_i  (1-Ai)a„ 

(1  - h2)b\  0 0 ...  0 0 

(J-H)L  = | o (1  -h3)b2  0 ...  0 0 


0 


0 


0 


...  (1  — h„)bn—\ 


0 


(3) 


Thus,  we  see  that  (/  — H)L  is  a matrix  with  the  same  mathematical  form  as  a Leslie  matrix.  In  Section  10.17 
we  showed  that  a necessary  and  sufficient  condition  for  a Leslie  matrix  to  have  1 as  an  eigenvalue  is  that  its 
net  reproduction  rate  also  be  1 [see  Eq.  16  of  Section  10.17],  Calculating  the  net  reproduction  rate  of 
(I  — H)L  and  setting  it  equal  to  1,  we  obtain  (verify) 

(1  -h\)  [a i +i32&i(l  -h2)  +03^1^2(1  -^2)0  ~^3)  + — 

+ anbxb2..Jbn- !(1  -h2)(  1 -A3)...(l -hn)\  = 1 (4 


This  equation  places  a restriction  on  the  allowable  harvesting  fractions.  Only  those  values  of  h\,  h2 hn 


that  satisfy  4 and  that  lie  in  the  interval  [0,  1 ] can  produce  a sustainable  yield. 

If  h i , h2,  ■ ■ hn  do  satisfy  4,  then  the  matrix  (/  — H)L  has  the  desired  eigenvalue  Aj  = 1 . Furthermore,  this 
eigenvalue  has  multiplicity  1 , because  the  positive  eigenvalue  of  a Leslie  matrix  always  has  multiplicity  1 
(Theorem  10.17.1).  This  means  that  there  is  only  one  linearly  independent  eigenvector  x satisfying  Equation 
2.  [See  Exercise  3(b)  of  Section  10.17.]  One  possible  choice  for  x is  the  following  normalized  eigenvector: 

1 

*ld -*2) 

*l*2d -*2)0-A3) 

X1  *1*2*3(l-*2)(l-*3)(l-*4) 

-*2)(1  -*3)— (1  ~hn) 

Any  other  solution  x of  2 is  a multiple  of  xj . Thus,  the  vector  xj  determines  the  proportion  of  females  within 
each  of  the  n classes  after  a harvest  under  a sustainable  harvesting  policy.  But  there  is  an  ambiguity  in  the 
total  number  of  females  in  the  population  after  each  harvest.  This  can  be  determined  by  some  auxiliary 
condition,  such  as  an  ecological  or  economic  constraint.  For  example,  for  a population  economically 
supported  by  the  harvester,  the  largest  population  the  harvester  can  afford  to  raise  between  harvests  would 
determine  the  particular  constant  that  xj  is  multiplied  by  to  produce  the  appropriate  vector  x in  Equation  2. 
For  a wild  population,  the  natural  habitat  of  the  population  would  determine  how  large  the  total  population 
could  be  between  harvests. 

Summarizing  our  results  so  far,  we  see  that  there  is  a wide  choice  in  the  values  of  h\,  hj, ...,  hn  that  will 
produce  a sustainable  yield.  But  once  these  values  are  selected,  the  proportional  age  distribution  of  the 
population  after  each  harvest  is  uniquely  determined  by  the  normalized  eigenvector  xj  defined  by  Equation  5. 
We  now  consider  a few  particular  harvesting  strategies  of  this  type. 


Uniform  Harvesting 


With  many  populations  it  is  difficult  to  distinguish  or  catch  animals  of  specific  ages.  If  animals  are  caught  at 
random,  we  can  reasonably  assume  that  the  same  fraction  of  each  age  class  is  harvested.  We  therefore  set 

h=h\=k  2 = ...  = hn 


Equation  2 then  reduces  to  (verify) 


Hence,  1 / (1  — h)  must  be  the  unique  positive  eigenvalue  X\  of  the  Leslie  growth  matrix  L.  That  is, 

Al  = 

Solving  for  the  harvesting  fraction  h , we  obtain 

* = ! — (!  / Ai)  (6) 


The  vector  ? in  this  case,  is  the  same  as  the  eigenvector  of  L corresponding  to  the  eigenvalue  X\ . From 


Equation  8 of  Section  10.17,  this  is 


1 

*l/Ai 

b 1*2 /A? 

Xl_  bxb2b2tx\  ( 

From  6 we  can  see  that  the  larger  X\  is,  the  larger  is  the  fraction  of  animals  we  can  harvest  without  depleting 
the  population.  Note  that  we  need  Ai  > 1 in  order  for  the  harvesting  fraction  h to  lie  in  the  interval  (0,  1 ) . 
This  is  to  be  expected,  because  Ai  > 1 is  the  condition  that  the  population  be  increasing. 

EXAMPLE  1 Harvesting  Sheep 


For  a certain  species  of  domestic  sheep  in  New  Zealand  with  a growth  period  of  1 year,  the 
following  Leslie  matrix  was  found  (see  G.  Caughley,  “Parameters  for  Seasonally  Breeding 


Populations,”  Ecology , 48 , 1967,  pp. 

834-839). 
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The  sheep  have  a lifespan  of  12  years,  so  they  are  divided  into  12  age  classes  of  duration  1 year 

each.  By  the  use  of  numerical  techniques,  the  unique  positive  eigenvalue  of  L can  be  found  to 
be 

Ai  = 1.176 

From  Equation  6,  the  harvesting  fraction  h is 

h = 1 - (1  / Ai)  = 1 - (1  / 1.176)  = .150 

Thus,  the  uniform  harvesting  policy  is  one  in  which  1 5.0  % of  the  sheep  from  each  of  the  12 
age  classes  is  harvested  every  year.  From  7 the  age  distribution  vector  of  the  sheep  after  each 
harvest  is  proportional  to 


1.000 

0.719 

0.596 

0.489 

0.395 

0.311 

0.237 

0.171 

0.114 

0.067 

0.032 

0.010 


(8) 


From  8 we  see  that  for  every  1 000  sheep  between  0 and  1 year  of  age  that  are  not  harvested, 
there  are  719  sheep  between  1 and  2 years  of  age,  596  sheep  between  2 and  3 years  of  age,  and 
so  forth. 


Harvesting  Only  the  Youngest  Age  Class 

In  some  populations  only  the  youngest  females  are  of  any  economic  value,  so  the  harvester  seeks  to  harvest 
only  the  females  from  the  youngest  age  class.  Accordingly,  let  us  set 

h\  = h 

= ^3  = ...  = hn  — 0 

Equation  4 then  reduces  to 

(1  -h)(a  i +<32^1  +<23^1^2  +-~  + anb\h---bn-l)  = 1 

or 

(1 -/*)/?=  1 

where  R is  the  net  reproduction  rate  of  the  population.  [See  Equation  17  of  Section  10.17.]  Solving  for  h,  we 
obtain 


A=l-(1/*)  (9) 

Note  from  this  equation  that  a sustainable  harvesting  policy  is  possible  only  if  R > ] . This  is  reasonable 
because  only  if  R > 1 is  the  population  increasing.  From  Equation  5,  the  age  distribution  vector  after  each 
harvest  is  proportional  to  the  vector 


Y1  = 


(10) 


1 

h 

b ih 

* 1*2*3-  -K-\ 


EXAMPLE  2 Sustainable  Harvesting  Policy 


Let  us  apply  this  type  of  sustainable  harvesting  policy  to  the  sheep  population  in  Example  1 . 
For  the  net  reproduction  rate  of  the  population  we  find 

R = a\  -E  <3 2* i -E (23*1*2  4“ --- -E  £h*1*2--*m— 1 

= (.000)  + (.045)  (.845)  + ...+  (421)(.845)(.975)...(.370) 

= 2.514 

From  Equation  9,  the  fraction  of  the  first  age  class  harvested  is 

h = 1 - (1  /R)  = 1 - (1  / 2.514)  = .602 

From  Equation  10,  the  age  distribution  of  the  sheep  population  after  the  harvest  is  proportional 
to  the  vector 


xj  = 


1.000 

.845 

(.845)  (.975) 
(.845)  (.975)  (.965) 


(.845)  (.975).. .(  370) 
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0.060 


(11) 


A direct  calculation  gives  us  the  following  (see  also  Exercise  3): 


Lxi  = 


2.514 

0.845 

0.824 

0.795 

0.755 

0.699 

0.626 

0.532 

0.418 

0.289 

0.162 

0.060 


(12) 


The  vector  Z,xj  is  the  age  distribution  vector  immediately  before  the  harvest.  The  total  of  all 
entries  in  Lx\  is  8.520,  so  the  first  entry  2.514  is  29.5%  of  the  total.  This  means  that 
immediately  before  each  harvest,  29.5%  of  the  population  is  in  the  youngest  age  class.  Since 
60.2%  of  this  class  is  harvested,  it  follows  that  17.8%  (=  60.2%  of  29.5%)  of  the  entire  sheep 
population  is  harvested  each  year.  This  can  be  compared  with  the  uniform  harvesting  policy  of 
Example  1,  in  which  15.0%  of  the  sheep  population  is  harvested  each  year. 


Optimal  Sustainable  Yield 

We  saw  in  Example  1 that  a sustainable  harvesting  policy  in  which  the  same  fraction  of  each  age  class  is 
harvested  produces  a yield  of  15.0  % of  the  sheep  population.  In  Example  2 we  saw  that  if  only  the  youngest 
age  class  is  harvested,  the  resulting  yield  is  17.8  % of  the  population.  There  are  many  other  possible 
sustainable  harvesting  policies,  and  each  generally  provides  a different  yield.  It  would  be  of  interest  to  find  a 
sustainable  harvesting  policy  that  produces  the  largest  possible  yield.  Such  a policy  is  called  an  optimal 
sustainable  harvesting  policy,  and  the  resulting  yield  is  called  the  optimal  sustainable  yield.  However, 
determining  the  optimal  sustainable  yield  requires  linear  programming  theory,  which  we  will  not  discuss  here. 
We  refer  you  to  the  following  result,  which  appears  in  J.  R.  Beddington  and  D.  B.  Taylor,  “Optimum  Age 
Specific  Harvesting  of  a Population,”  Biometrics,  29,  1973,  pp.  801-809. 


Optimal  Sustainable  Yield 

An  optimal  sustainable  harvesting  policy  is  one  in  which  either  one  or  two  age  classes  are  harvested. 
If  two  age  classes  are  harvested,  then  the  older  age  class  is  completely  harvested. 


As  an  illustration,  it  can  be  shown  that  the  optimal  sustainable  yield  of  the  sheep  population  is  attained  when 


Al=  0.522 

kg  = 1.000 


(13) 


and  all  other  values  of  kj  are  zero.  Thus,  52.2  % of  the  sheep  between  0 and  1 year  of  age  and  all  the  sheep 
between  8 and  9 years  of  age  are  harvested.  As  we  ask  you  to  show  in  Exercise  2,  the  resulting  optimal 
sustainable  yield  is  19.9  % of  the  population. 


Exercise  Set  10.18 


1.  Let  a certain  animal  population  be  divided  into  three  1-year  age  classes  and  have  as  its  Leslie  matrix 


L = 


0 

1 

2 

0 


4 3 
0 0 


(a)  Lind  the  yield  and  the  age  distribution  vector  after  each  harvest  if  the  same  fraction  of  each  of  the 
three  age  classes  is  harvested  every  year. 

(b)  Lind  the  yield  and  the  age  distribution  vector  after  each  harvest  if  only  the  youngest  age  class  is 
harvested  every  year.  Also,  find  the  fraction  of  the  youngest  age  class  that  is  harvested. 


Answer: 


(a) 


Yield  = 33-=r%  of  population;  xi 


(b) 


Yield  = 45.8%  of  population;  xj 


1 

1 

3 

J_ 

18 
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1 
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; harvest  57.9%  of  youngest  age  class 


2.  For  the  optimal  sustainable  harvesting  policy  described  by  Equations  13,  find  the  vector  xj  that  specifies 
the  age  distribution  of  the  population  after  each  harvest.  Also  calculate  the  vector  Lx\  and  verify  that  the 
optimal  sustainable  yield  is  19.9  % of  the  population. 


Answer: 
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3.  Use  Equation  10  to  show  that  if  only  the  first  age  class  of  an  animal  population  is  harvested 


Lx\  — xj  = 


R-  1 
0 
0 


0 


where  R is  the  net  reproduction  rate  of  the  population. 


4.  If  only  the  z'th  class  of  an  animal  population  is  to  be  periodically  harvested  (/  = 1,  2, ....  n),  find  the 
corresponding  harvesting  fraction  hj. 


Answer: 

(R-  1)  / (a jb\b2  • • • 6/-i  + • • • + a„b\b2’  ■ 'bn- 1) 

5.  Suppose  that  all  of  the  Jth  class  and  a certain  fraction  hj  of  the  7th  class  of  an  animal  population  is  to  be 
periodically  harvested  (1  < I < J < n) . Calculate  hj. 


Answer: 


131+^2^1+  ' ' ' + (aj-ib\b2  • * ’_bj- 2)  — ! 
aib\b2'  • •&/_!+  • • • + aj-\b\b2  • • • bj-2 


Technology  Exercises 


The  following  exercises  are  designed  to  be  solved  using  a technology  utility.  Typically,  this  will  be  MATLAB, 
Mathematica,  Maple,  Derive,  or  Mathcad,  but  it  may  also  be  some  other  type  of  linear  algebra  software  or  a 
scientific  calculator  with  some  linear  algebra  capabilities.  For  each  exercise  you  will  need  to  read  the 
relevant  documentation  for  the  particular  utility  you  are  using.  The  goal  of  these  exercises  is  to  provide  you 
with  a basic  proficiency  with  your  technology  utility.  Once  you  have  mastered  the  techniques  in  these 
exercises,  you  will  be  able  to  use  your  technology  utility  to  solve  many  of  the  problems  in  the  regular 
exercise  sets. 


Tl.  The  results  of  Theorem  10.18.1  suggest  the  following  algorithm  for  determining  the  optimal  sustainable 
yield. 


1 .  For  each  value  of  i = 1 , 2, . . .,  n,  set  kj  = h and  = 0 lor  k ^ i and  calculate  the  respective  yields.  These 
n calculations  give  the  one-age-class  results.  Of  course,  any  calculation  leading  to  a value  of  h not  between 
0 and  1 is  rejected. 


2.  For  each  value  of  j = 1,  2, — 1 and  j = i + 1,  i + 2 n,  set  hj  = h,  ftj  = 1,  and  h^  = 0 for  it 

j and  calculate  the  respective  yields.  These  — 1)  calculations  give  the  two-age-class  results.  Of 
course,  any  calculation  leading  to  a value  of  h not  between  0 and  1 is  again  rejected. 


3.  Of  the  yields  calculated  in  parts  (i)  and  (ii),  the  largest  is  the  optimal  sustainable  yield.  Note  that  there  will 
be  at  most 

n + \n(n  — 1)  = \n(n  + 1) 


calculations  in  all.  Once  again,  some  of  these  may  lead  to  a value  of  h not  between  0 and  1 and  must 
therefore  be  rejected. 


If  we  use  this  algorithm  for  the  sheep  example  in  the  text,  there  will  be  at  most  ^(12)(12  + 1)  = 78 


calculations  to  consider.  Use  a computer  to  do  the  two-age-class  calculations  for  h\  = k,  hj  = 1,  and  = 0 
for  ^ ^ j or  j for  j = 2,3,...,  12.  Construct  a summary  table  consisting  of  the  values  of  h \ and  the 
percentage  yields  using  j = 2,  3, 12,  which  will  show  that  the  largest  of  these  yields  occurs  when  j = 9. 


T2.  Using  the  algorithm  in  Exercise  T1  , do  the  one-age-class  calculations  for  h 2-  = h and  = 0 for  ^ j for 
i = 1,  2, 12  Construct  a summary  table  consisting  of  the  values  of  k 2 and  the  percentage  yields  using 
i = 1,  2, 12,  which  will  show  that  the  largest  of  these  yields  occurs  when  j = 9- 


T3.  Referring  to  the  mouse  population  in  Exercise  T3  of  Section  10.17,  suppose  that  reducing  the  birthrates 
is  not  practical,  so  you  instead  decide  to  control  the  population  by  uniformly  harvesting  all  of  the  age  classes 
monthly. 

(a)  What  fraction  of  the  population  must  be  harvested  monthly  to  bring  the  mouse  population  to  equilibrium 
eventually? 

(b)  What  is  the  equilibrium  age  distribution  vector  under  this  uniform  harvesting  policy? 

(c)  The  total  number  of  mice  in  the  original  mouse  population  was  155.  What  would  be  the  total  number  of 
mice  after  5,10,  and  200  months  under  your  uniform  harvesting  policy? 
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10.19  A Least  Squares  Model  for  Human  Hearing 

In  this  section  we  apply  the  method  of  least  squares  approximation  to  a model  for  human  hearing.  The  use  of  this 
method  is  motivated  by  energy  considerations. 


Prerequisites 

Inner  Product  Spaces 
Orthogonal  Projection 
Fourier  Series  (Section  6.6) 


Anatomy  of  the  Ear 


We  begin  with  a brief  discussion  of  the  nature  of  sound  and  human  hearing.  Figure  10.19.1  is  a schematic  diagram 
of  the  ear  showing  its  three  main  components:  the  outer  ear,  middle  ear,  and  inner  ear.  Sound  waves  enter  the  outer 
ear  where  they  are  channeled  to  the  eardrum,  causing  it  to  vibrate.  Three  tiny  bones  in  the  middle  ear  mechanically 
link  the  eardrum  with  the  snail-shaped  cochlea  within  the  inner  ear.  These  bones  pass  on  the  vibrations  of  the 
eardrum  to  a fluid  within  the  cochlea.  The  cochlea  contains  thousands  of  minute  hairs  that  oscillate  with  the  fluid. 
Those  near  the  entrance  of  the  cochlea  are  stimulated  by  high  frequencies,  and  those  near  the  tip  are  stimulated  by 
low  frequencies.  The  movements  of  these  hairs  activate  nerve  cells  that  send  signals  along  various  neural  pathways 
to  the  brain,  where  the  signals  are  interpreted  as  sound. 


Sound 

wave 


Figure  10.19.1 


Auditory 

nerve 


To 

brain 


The  sound  waves  themselves  are  variations  in  time  of  the  air  pressure.  For  the  auditory  system,  the  most 
elementary  type  of  sound  wave  is  a sinusoidal  variation  in  the  air  pressure.  This  type  of  sound  wave  stimulates  the 
hairs  within  the  cochlea  in  such  a way  that  a nerve  impulse  along  a single  neural  pathway  is  produced  (Figure 
10.19.2).  A sinusoidal  sound  wave  can  be  described  by  a function  of  time 


q (t)  = Ao  + A sin(u )t  — 8) 


(1) 


where  q(t)  is  the  atmospheric  pressure  at  the  eardrum,  ^4q  is  the  normal  atmospheric  pres-sure,  A is  the  maximum 
deviation  of  the  pressure  from  the  normal  atmospheric  pressure,  ^ / 2tt  is  the  frequency  of  the  wave  in  cycles  per 
second,  and  $ is  the  phase  angle  of  the  wave.  To  be  perceived  as  sound,  such  sinusoidal  waves  must  have 
frequencies  within  a certain  range.  For  humans  this  range  is  roughly  20  cycles  per  second  (cps)  to  20,000  cps. 
Frequencies  outside  this  range  will  not  stimulate  the  hairs  within  the  cochlea  enough  to  produce  nerve  signals. 


Figure  10.19.2 


To  a reasonable  degree  of  accuracy,  the  ear  is  a linear  system.  This  means  that  if  a complex  sound  wave  is  a finite 
sum  of  sinusoidal  components  of  different  amplitudes,  frequencies,  and  phase  angles,  say, 

q{t)  =4]  + A\  sin^i^  — S\)  A A2  sin (^2*  — ^2)  + ...  + An  sin(u;M*  ^8n)  (2) 

then  the  response  of  the  ear  consists  of  nerve  impulses  along  the  same  neural  pathways  that  would  be  stimulated  by 
the  individual  components  (Figure  10.19.3). 

u 


Figure  10.19.3 

Let  us  now  consider  some  periodic  sound  wave  p(t)  with  period  T [i.e.,  p(t)  pit  + T)]  that  is  not  a finite  sum 
of  sinusoidal  waves.  If  we  examine  the  response  of  the  ear  to  such  a periodic  wave,  we  find  that  it  is  the  same  as 
the  response  to  some  wave  that  is  the  sum  of  sinusoidal  waves.  That  is,  there  is  some  sound  wave  q(t)  as  given  by 
Equation  2 that  produces  the  same  response  as  pit),  even  though  p(t)  and  1 q(t)  are  different  functions  of  time. 

We  now  want  to  determine  the  frequencies,  amplitudes,  and  phase  angles  of  the  sinusoidal  components  of  q(t). 
Because  q(t)  produces  the  same  response  as  the  periodic  wave  p(t ),  it  is  reasonable  to  expect  that  q{t)  has  the 
same  period  T as  p{i).  This  requires  that  each  sinusoidal  term  in  q(t)  have  period  T.  Consequently,  the  frequencies 


of  the  sinusoidal  components  must  be  integer  multiples  of  the  basic  frequency  1 / p of  the  function  p(t).  Thus,  the 
l Jfc  in  Equation  2 must  be  of  the  form 

uik  = 2 forfT,  k = 1,2,... 

But  because  the  ear  cannot  perceive  sinusoidal  waves  with  frequencies  greater  than  20,000  cps,  we  may  omit  those 
values  of  k for  which  / 2ir  = k / T is  greater  than  20,000.  Thus,  q(t)  is  of  the  form 

q(t)=A0  + Al  + + (3) 

where  n is  the  largest  integer  such  that  n!  T is  n°f  greater  than  20,000. 

We  now  turn  our  attention  to  the  values  of  the  amplitudes  Aq,  A\, An  and  the  phase  angles  5\,  62 , - that 
appear  in  Equation  3.  There  is  some  criterion  by  which  the  auditory  system  “picks”  these  values  so  that  q (t) 
produces  the  same  response  as  p{i).  To  examine  this  criterion,  let  us  set 

e(t)=p(t)  —q(t) 

If  we  consider  q(t)  as  an  approximation  to  p{t),  then  e (t)  is  the  error  in  this  approximation,  an  error  that  the  ear 
cannot  perceive.  In  terms  of  e{t),  the  criterion  for  the  determination  of  the  amplitudes  and  the  phase  angles  is  that 
the  quantity 

f [e(t)]2dt=[  [p{t)-q{t)]2dt  (4) 

h J 0 

be  as  small  as  possible.  We  cannot  go  into  the  physiological  reasons  for  this,  but  we  note  that  this  expression  is 
proportional  to  the  acoustic  energy  of  the  error  wave  e(t)  over  one  period.  In  other  words,  it  is  the  energy  of  the 
difference  between  the  two  sound  waves  p(t)  and  q(t)  that  determines  whether  the  ear  perceives  any  difference 
between  them.  If  this  energy  is  as  small  as  possible,  then  the  two  waves  produce  the  same  sensation  of  sound. 
Mathematically,  the  function  q(t)  in  4 is  the  least  squares  approximation  to  p{i)  from  the  vector  space  C[  0,  T]  of 
continuous  functions  on  the  interval  [0,  T] . (See  Section  6.6.) 


Least  squares  approximations  by  continuous  functions  arise  in  a wide  variety  of  engineering  and  scientific 
approximation  problems.  Apart  from  the  acoustics  problem  just  discussed,  some  other  examples  follow. 

Let  S(x)  be  the  axial  strain  distribution  in  a uniform  rod  lying  along  the  x-axis  from  * = 0 to  x = / (Figure 
10.19.4).  The  strain  energy  in  the  rod  is  proportional  to  the  integral 

I [S(x)]2dx 

J 0 

The  closeness  of  an  approximation  q(x)  to  S(x)  can  be  judged  according  to  the  strain  energy  of  the  difference 
of  the  two  strain  distributions.  That  energy  is  proportional  to 

/ [S(x)  -q(x)]2  dx 

J 0 

which  is  a least  squares  criterion. 

Let  E{t)  be  a periodic  voltage  across  a resistor  in  an  electrical  circuit  (Figure  10.19.5).  The  electrical  energy 
transferred  to  the  resistor  during  one  period  T is  proportional  to 

[E(t)]2dt 

ltq{i)  has  the  same  period  as  E{i)  and  is  to  be  an  approximation  to  E{t),  then  the  criterion  of  closeness  might 
be  taken  as  the  energy  of  the  difference  voltage.  This  is  proportional  to 


/ 


which  is  again  a least  squares  criterion. 

Let  y(x)  be  the  vertical  displacement  of  a uniform  flexible  string  whose  equilibrium  position  is  along  the  x-axis 
from  x = 0 to  x = l (Figure  10.19.6).  The  elastic  potential  energy  of  the  string  is  proportional  to 

A 


r 

J 0 


[yWVdx 


If  ^(x)  is  to  be  an  approximation  to  the  displacement,  then  as  before,  the  energy  integral 

W 


/ 
J 0 


[y  (x)-q(x)Vdx 

determines  a least  squares  criterion  for  the  closeness  of  the  approximation. 
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Figure  10.19.5 


Least  squares  approximation  is  also  used  in  situations  where  there  is  no  a priori  justification  for  its  use,  such  as  for 
approximating  business  cycles,  population  growth  curves,  sales  curves,  and  so  forth.  It  is  used  in  these  cases 
because  of  its  mathematical  simplicity.  In  general,  if  no  other  error  criterion  is  immediately  apparent  for  an 
approximation  problem,  the  least  squares  criterion  is  the  one  most  often  chosen. 


The  following  result  was  obtained  in  Section  6.6. 


Minimizing  Mean  Square  Error  on  [0,  2t t] 


If  f (t)  is  continuous  on  [0,  2tt]  , then  the  trigonometric  function  g(t)  of  the  form 


that  minimizes  the  mean  square  error 


0 


has  coefficients 


has  coefficients 


= 0,  1,  2 


bk  = f k=  1,  2,...,n 

IT 


= 1,  2 


If  the  original  function  f (t)  is  defined  over  the  interval  [0,  T]  instead  of  [0,  2x] , a change  of  scale  will  yield  the 
following  result  (see  Exercise  8): 

Minimizing  Mean  Square  Error  on  [0,  7] 

If  f (t)  is  continuous  on  [0,  T] , then  the  trigonometric  function  g{t)  of  the  form 


EXAMPLE  1 Least  Squares  Approximation  to  a Sound  Wave 

Let  a sound  wave  p{t)  have  a saw-tooth  pattern  with  a basic  frequency  of  5000  cps  (Figure  10.19.7). 
Assume  units  are  chosen  so  that  the  normal  atmospheric  pressure  is  at  the  zero  level  and  the 
maximum  amplitude  of  the  wave  is  A.  The  basic  period  of  the  wave  is7’  = 1 / 5000  = .0002  second. 
From  t = 0 to  t = the  function  p(t)  has  the  equation 


g(t)  = ±aQ  + ai  cos^rt  + ...  + a„  cos-^i  + ii  sin ^ sin ^-t 


that  minimizes  the  mean  square  error 


has  coefficients 


2 A(T 


Theorem  10.19.2  then  yields  the  following  (verify): 


11 

0 

J"  p(0  dl-\ 

(¥( {-■)*- 

ak  H 

rTPmos  =*? 

-■1 

in- 

tyos2kELdt  = 0,  *=1,2,... 

h = 

rTH,)s  m2^ 

d‘  = h 

y^2M.dt=^,  *= 1,2.... 

We  can  now  investigate  how  the  sound  wave  p(t)  is  perceived  by  the  human  ear.  We  note  that 
4 f T=  20,000  cps,  so  we  need  only  go  up  to  = 4 in  the  formulas  above.  The  least  squares 
approximation  to  p{i)  is  then 

_ 2A  [ • 2tt , , 1 • 4?r , , 1 • , 1 • 8tt f 

q{t)  — “j”  sin— ^ I — bin-^r^  I ^sm —t  \ 

The  four  sinusoidal  terms  have  frequencies  of  5000,  10,000,  15,000,  and  20,000  cps,  respectively.  In 
Figure  10.19.8  we  have  plotted  p{t)  and  q(t)  over  one  period.  Although  q(t)  is  not  a very  good 
point-by-point  approximation  to  p(t),  to  the  ear,  both  p{i)  and  q (t)  produce  the  same  sensation  of 
sound. 
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Figure  10.19.7 


As  discussed  in  Section  6.6,  the  least  squares  approximation  becomes  better  as  the  number  of  terms  in  the 
approximating  trigonometric  polynomial  becomes  larger.  More  precisely, 


f2srr  . m 

/ / 00  - 7r<zo  - E Oft  COS  to  + 6*  sin  to) 

Jo  I ^ *=1 


tends  to  zero  as  n approaches  infinity.  We  denote  this  by  writing 

1 oo 

/ (t)  ~ “deg  + 5Z  (tffc  cos  to  + sm  to) 

^ ft=l 

where  the  right  side  of  this  equation  is  the  Fourier  series  of  f {t).  Whether  the  Fourier  series  of  f (t)  converges  to 
/ (t)  for  each  t is  another  question,  and  a more  difficult  one.  For  most  continuous  functions  encountered  in 
applications,  the  Fourier  series  does  indeed  converge  to  its  corresponding  function  for  each  value  of  t. 


Exercise  Set  10.19 


1.  Find  the  trigonometric  polynomial  of  order  3 that  is  the  least  squares  approximation  to  the  function 
/ (0  = (t  — tt)^  over  the  interval  [0,  2tt]  . 


Answer: 

-2  A 

~~  -F  4 cos  t + cos  2 1 + t-cos  3 1 

5 y 

2.  Find  the  trigonometric  polynomial  of  order  4 that  is  the  least  squares  approximation  to  the  function  / (t)  =t 
over  the  interval  [0,  T] . 

Answer: 


p + p (cos  P + p cos  P + ji  cos  P + $ cos  p 

( ■ 2 77  . , 1 • 4?7  - , 1 ■ 6 77  . , 1 • 8 77 
— — I sin  —t  + — sm  —t  + — sm  —t  + — sm  —t  I 


3.  Find  the  trigonometric  polynomial  of  order  4 that  is  the  least  squares  approximation  to  the  function  / ( t ) over 
the  interval  [0,  2tt]  , where 


f{t)  = 


0 <t  <77 
77  < t 277 


Answer: 


— + f sin  t — cos  2 1 — cos  At 
K 2 377  1577 

4.  Find  the  trigonometric  polynomial  of  arbitrary  order  n that  is  the  least  squares  approximation  to  the  function 
f (t)  = sin-^-i  over  the  interval  [0,  2tt]  . 


Answer: 


4/1 1 

*\2  1 • 
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3COS‘-3-5 


cos  2 1 — 


1 


cos  3 1 — 


1 


cos  nt 


5-7  (2*-l)(2* +1) 

5.  Find  the  trigonometric  polynomial  of  arbitrary  order  n that  is  the  least  squares  approximation  to  the  function 
f (t)  over  the  interval  [0,  T] , where 


Answer: 
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ST 
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6i[t 

7 T 102 
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10t it  , 
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COS 


2ni:t 
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6.  For  the  inner  product 


(u,  v 


u(t)v(t)  dt 


show  that 

(a)  11111  = /*= 

(b)  ||cosfo||  = /x  for£=l,2, ... 

(c)  ||sinfo||  = /x  for  k=  1,  2, ... 

7.  Show  that  the  2 n | 1 functions 

1,  cos  t , cos  2t, cos  nt , sin  sin  2*, sin  nt 

are  orthogonal  over  the  interval  [0,  2 x]  relative  to  the  inner  product  (u,  vj  defined  in  Exercise  6. 

8.  If  / {£)  is  defined  and  continuous  on  the  interval  [0,  T] , show  that  / (7V  / 2?r)  is  defined  and  continuous  for  7- 
in  the  interval  [0,  2tt]  . Use  this  fact  to  show  how  Theorem  10.19.2  follows  from  Theorem  10.19.1. 


Technology  Exercises 


The  following  exercises  are  designed  to  be  solved  using  a technology  utility.  Typically,  this  will  be  MATLAB, 
Mathematica,  Maple,  Derive,  or  Mathcad,  but  it  may  also  be  some  other  type  of  linear  algebra  software  or  a 
scientific  calculator  with  some  linear  algebra  capabilities.  For  each  exercise  you  will  need  to  read  the  relevant 
documentation  for  the  particular  utility  you  are  using.  The  goal  of  these  exercises  is  to  provide  you  with  a basic 
proficiency  with  your  technology  utility.  Once  you  have  mastered  the  techniques  in  these  exercises,  you  will  be 
able  to  use  your  technology  utility  to  solve  many  of  the  problems  in  the  regular  exercise  sets. 


Tl.  Let  g be  the  function 

ff(t)  = l±Aml 

S{  ) 5 — 4 cos  t 

for  0 < t < 2tt.  Use  a computer  to  determine  the  Fourier  coefficients 

f27T 


= 1 f t 3 + 4 sin*  \ i 
%J0  \ 5 -4  cos  * h 


cos  kt 
sin  kt 


dt 


for  k = 0,  1,  2,  3,  4,  5.  From  your  results,  make  a conjecture  about  the  general  expressions  for  and  b ft.  Test  your 

conjecture  by  calculating 

1 00 

^0  + 5Z  (&k  cos  tct  + bfr  sin  kt) 
z k= 1 

on  the  computer  and  see  whether  it  converges  to  g(t) . 


T2.  Let  g be  the  function 


g(t)  =ecosf[cos(sin£)  + sin(sin^)] 
for  0 < t < 2tt.  Use  a computer  to  determine  the  Fourier  coefficients 


for  k = 0,  1,  2,  3,  4,  5.  From  your  results,  make  a conjecture  about  the  general  expressions  for  ct £ and  by c.  Test  your 
conjecture  by  calculating 

1 oo 

+ 5Z  (a* cos  to  + sm  to) 

1 ft= 1 

on  the  computer  and  see  whether  it  converges  to  g(t). 
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10.20  Warps  and  Morphs 

Among  the  more  interesting  image-manipulation  techniques  available  for  computer  graphics  are  warps  and 
morphs.  In  this  section  we  show  how  linear  transformations  can  be  used  to  distort  a single  picture  to  produce 
a warp,  or  to  distort  and  blend  two  pictures  to  produce  a morph. 


Prerequisites 

Geometry  of  Linear  Operators  on  g}  (Section  4.11) 
Linear  Independence 
Bases  in  g} 


Computer  graphics  software  enables  you  to  manipulate  an  image  in  various  ways,  such  as  by  scaling,  rotating, 
or  slanting  the  image.  Distorting  an  image  by  separately  moving  the  comers  of  a rectangle  containing  the 
image  is  another  basic  image-manipulation  technique.  Distorting  various  pieces  of  an  image  in  different  ways 
is  a more  complicated  procedure  that  results  in  a warp  of  the  picture.  In  addition,  warping  two  different 
images  in  complementary  ways  and  blending  the  warps  results  in  a morph  of  the  two  pictures  (from  the  Greek 
root  meaning  “shape”  or  “form”).  An  example  is  Figure  10.20.1  in  which  four  photographs  of  a woman  taken 
over  a 50-year  period  (the  four  diagonal  pictures  from  top  left  to  bottom  right)  have  been  pairwise  morphed 
by  different  amounts  to  suggest  the  gradual  aging  of  the  woman. 


Figure  10.20.1 

The  most  visible  application  of  warping  and  morphing  images  has  been  the  production  of  special  effects  in 
motion  pictures  and  television.  However,  many  scientific  and  technological  applications  of  such  techniques 
have  also  arisen — for  example,  studying  the  evolution,  growth,  and  development  of  living  organisms, 
assisting  in  reconstructive  and  cosmetic  surgery,  exploring  various  designs  of  a product,  and  “aging” 
photographs  of  missing  persons  or  police  suspects. 


Warps 

We  begin  by  describing  a simple  warp  of  a triangular  region  in  the  plane.  Let  the  three  vertices  of  a triangle  be 
given  by  the  three  noncollinear  points  vq,  V2,  and  V3  (Figure  10.20.2a).  We  will  call  this  triangle  the  begin- 
triangle.  If  v is  any  point  in  the  begin-triangle,  then  there  are  unique  constants  c\  and  c 2 such  that 


v-v3  = ci(vi  — v3)  +C2(V2-V3) 


(1) 


Equation  1 expresses  the  vector  v — V3  as  a (unique)  linear  combination  of  the  two  linearly  independent 
vectors  vi  — V3  and  V2  — V3  with  respect  to  an  origin  at  V3.  If  we  set  03  = 1 — c\  — c 3,  then  we  can  rewrite  1 
as 


v = c 1 vi  + C2V2  + C3V3  (2) 

where 

Cl+C2+C3=l  (3) 

from  the  definition  of  c 3.  We  say  that  v is  a convex  combination  of  the  vectors  vi  ? V2,  and  V3  if  2 and  3 are 
satisfied  and,  in  addition,  the  coefficients  ^1,^2?  and  c 3 are  nonnegative.  It  can  be  shown  (Exercise  6)  that  v 
lies  in  the  triangle  determined  by  vi  , V2,  and  V3  if  and  only  if  it  is  a convex  combination  of  those  three 
vectors. 
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Figure  10.20.2 

Next,  given  three  noncollinear  points  W2,  and  W3  of  an  end-triangle  (Figure  10.20.2Zi),  there  is  a unique 

affine  transformation  that  maps  vj  to  V2  to  w2?  and  V3  to  W3.  That  is,  there  is  a unique  2x2  invertible 

matrix  M and  a unique  vector  b such  that 

w2  = Mvj  + b for  j = 1 , 2,  3 (4) 

(See  Exercise  5 for  the  evaluation  of  M and  b.)  Moreover,  it  can  be  shown  (Exercise  3)  that  the  image  w of  the 
vector  v in  2 under  this  affine  transformation  is 


w = cjwj  + C2W2  4-  C3W3 


(5) 


This  is  a basic  property  of  affine  transformations:  They  map  a convex  combination  of  vectors  to  the  same 
convex  combination  of  the  images  of  the  vectors. 

Now  suppose  that  the  begin-triangle  contains  a picture  within  it  (Figure  10.20.3a).  That  is,  to  each  point  in  the 
begin-triangle  we  assign  a gray  level,  say  0 for  white  and  100  for  black,  with  any  other  gray  level  lying 
between  0 and  100.  In  particular,  let  a scalar- valued  function  pg,  called  the  picture-density  of  the  begin- 
triangle,  be  defined  so  that  pg(v)  is  the  gray  level  at  the  point  v in  the  begin-triangle.  We  can  now  define  a 
picture  in  the  end-triangle,  called  a warp  of  the  original  picture,  with  a picture-density  p\  by  defining  the  gray 
level  at  the  point  w within  the  end-triangle  to  be  the  gray  level  of  the  point  v in  the  begin-triangle  that  maps 
onto  w.  In  equation  form,  the  picture-density  p\  is  determined  by 

pi  (w)  = po(civl  + C2V2  + C3V3)  (6) 

In  this  way,  as  c\,  C2,  and  c 3 vary  over  all  nonnegative  values  that  add  to  one,  5 generates  all  points  w in  the 
end-triangle,  and  6 generates  the  gray  levels  p\  (w)  of  the  warped  picture  at  those  points  (Figure  10.20.3Zi). 
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Figure  10.20.3 


Equation  6 determines  a very  simple  warp  of  a picture  within  a single  triangle.  More  generally,  we  can  break 
up  a picture  into  many  triangular  regions  and  warp  each  triangular  region  differently.  This  gives  us  much 
freedom  in  designing  a warp  through  our  choice  of  triangular  regions  and  how  we  change  them.  To  this  end, 
suppose  we  are  given  a picture  contained  within  some  rectangular  region  of  the  plane.  We  choose  n points  vj, 


V2, ....  vn  within  the  rectangle,  which  we  call  vertex  points,  so  that  they  fall  on  key  elements  or  features  of 
the  picture  we  wish  to  warp  (Figure  10.20.4a).  Once  the  vertex  points  are  chosen,  we  complete  a 
triangulation  of  the  rectangular  region;  that  is,  we  draw  line  segments  between  the  vertex  points  in  such  a 
way  that  we  have  the  following  conditions  (Figure  10.20.4Z?): 

The  line  segments  form  the  sides  of  a set  of  triangles. 

The  line  segments  do  not  intersect. 

Each  vertex  point  is  the  vertex  of  at  least  one  triangle. 

The  union  of  the  triangles  is  the  rectangle. 

The  set  of  triangles  is  maximal  (i.e.,  no  more  vertices  can  be  connected). 

Note  that  condition  4 requires  that  each  comer  of  the  rectangle  containing  the  picture  be  a vertex  point. 
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Figure  10.20.4 


One  can  always  form  a triangulation  from  any  n vertex  points,  but  the  triangulation  is  not  necessarily  unique. 


For  example,  Figures  10.20.4Z)  and  10.20.4c  are  two  different  triangulations  of  the  set  of  vertex  points  in 
Figure  10.20.4a.  Since  there  are  various  computer  algorithms  that  perform  triangulations  very  quickly,  it  is 
not  necessary  to  perform  the  tiresome  triangulation  task  by  hand;  one  need  only  specify  the  desired  vertex 
points  and  let  a computer  generate  a triangulation  from  them.  If  n is  the  number  of  vertex  points  chosen,  it  can 
be  shown  that  the  number  of  triangles  m of  any  triangulation  of  those  points  is  given  by 

m = 2n  — 2 — k (7) 

where  k is  the  number  of  vertex  points  lying  on  the  boundary  of  the  rectangle,  including  the  four  situated  at 
the  corner  points. 

The  warp  is  specified  by  moving  the  n vertex  points  v i , V2, . . .,  vn  to  new  locations  wq , W2, . . wn  according 
to  the  changes  we  desire  in  the  picture  (Figures  10.20.5a  and  10.20.5Z>).  However,  we  impose  two  restrictions 
on  the  movements  of  the  vertex  points: 

The  four  vertex  points  at  the  comers  of  the  rectangle  are  to  remain  fixed,  and  any  vertex  point  on  a side  of 
the  rectangle  is  to  remain  fixed  or  move  to  another  point  on  the  same  side  of  the  rectangle.  All  other  vertex 
points  are  to  remain  in  the  interior  of  the  rectangle. 

The  triangles  determined  by  the  triangulation  are  not  to  overlap  after  their  vertices  have  been  moved. 

The  first  restriction  guarantees  that  the  rectangular  shape  of  the  begin-picture  is  preserved.  The  second 
restriction  guarantees  that  the  displaced  vertex  points  still  form  a triangulation  of  the  rectangle  and  that  the 
new  triangulation  is  similar  to  the  original  one.  For  example,  Figure  10.20.5c  is  not  an  allowable  movement 
of  the  vertex  points  shown  in  Figure  10.20.5a.  Although  a violation  of  this  condition  can  be  handled 
mathematically  without  too  much  additional  effort,  the  resulting  warps  usually  produce  unnatural  results  and 
we  will  not  consider  them  here. 
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Figure  10.20.5 


Figure  10.20.6  is  a warp  of  a photograph  of  a woman  using  a triangulation  with  94  vertex  points  and  179 
triangles.  Note  that  the  vertex  points  in  the  begin-triangulation  are  chosen  to  lie  along  key  features  of  the 
picture  (hairline,  eyes,  lips,  etc.).  These  vertex  points  were  moved  to  final  positions  corresponding  to  those 
same  features  in  a picture  of  the  woman  taken  20  years  after  the  begin-picture.  Thus,  the  warped  picture 
represents  the  woman  forced  into  her  older  shape  but  using  her  younger  gray  levels. 
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Figure  10.20.6 


Time-Varying  Warps 

A time-varying  warp  is  the  set  of  warps  generated  when  the  vertex  points  of  the  begin-picture  are  moved 
continually  in  time  from  their  original  positions  to  specified  final  positions.  This  gives  us  a motion  picture 
which  the  begin-picture  is  continually  warped  to  a final  warp.  Let  us  choose  time  units  so  that  t = 0 
corresponds  to  our  begin-picture  and  t = 1 corresponds  to  our  final  warp.  The  simplest  way  of  moving  the 
vertex  points  from  time  0 to  time  1 is  with  constant  velocity  along  straight-line  paths  from  their  initial 


positions  to  their  final  positions. 


To  describe  such  a motion,  let  Ui(t)  denote  the  position  of  the  ith  vertex  point  at  any  time  t between  0 and  1. 
Thus  uj(0)  = Vi  (its  given  position  in  the  begin-picture)  and  Ui(l)  = (its  given  position  in  the  final  warp). 
In  between,  we  determine  its  position  by 


Ui(t)  = 0 -t)vi4  tWi  (8) 

Note  that  8 expresses  Ui(t)  as  a convex  combination  of  and  wj  for  each  t in  [0,  1],  Figure  10.20.7 
illustrates  a time-varying  triangulation  of  a plain  rectangular  region  with  six  vertex  points.  The  lines 
connecting  the  vertex  points  at  the  different  times  are  the  space-time  paths  of  these  vertex  points  in  this 
space-time  diagram. 


Once  the  positions  of  the  vertex  points  are  computed  at  time  t,  a warp  is  performed  between  the  begin-picture 
and  the  triangulation  at  time  t determined  by  the  displaced  vertex  points  at  that  time.  Figure  10.20.8  shows  a 
time-varying  warp  at  five  values  of  t generated  from  the  warp  between  t = 0 and  t = 1 shown  in  Figure 
10.20.6. 


r = 0.00  f = 0.25  t = 0.50  f = 0.75  f = 1.00 

Figure  10.20.8 


Morphs 


A time-varying  morph  can  be  described  as  a blending  of  two  time-varying  warps  of  two  different  pictures 
using  two  triangulations  that  match  corresponding  features  in  the  two  pictures.  One  of  the  two  pictures  is 
designated  as  the  begin-picture  and  the  other  as  the  end-picture.  First,  a time-varying  warp  from  t = Q to 
t = 1 is  generated  in  which  the  begin-picture  is  warped  into  the  shape  of  the  end-picture.  Then  a time-varying 
warp  from  t = \ to  t = 0 is  generated  in  which  the  end-picture  is  warped  into  the  shape  of  the  begin-picture. 
Finally,  a weighted  average  of  the  gray  levels  of  the  two  warps  at  each  time  t is  produced  to  generate  the 
morph  of  the  two  images  at  time  t. 


Figure  10.20.9  shows  two  photographs  of  a woman  taken  20  years  apart.  Below  the  pictures  are  two 
corresponding  triangulations  in  which  corresponding  features  of  the  two  photographs  are  matched.  The 
time- varying  morph  between  these  two  pictures  for  five  values  of  t between  0 and  1 is  shown  in  Figure 


10.20.10. 


Ikrgin-picture 


I ind-picture 


Bcgin-triangulation 


End-triangulation 


Figure  10.20.9 


Figure  10.20.10 


The  procedure  for  producing  such  a morph  is  outlined  in  the  following  nine  steps  (Figure  10.20.11): 

Given  a begin-picture  with  picture-density  pQ  and  an  end-picture  with  picture-density  pj,  position  n 
vertex  points  vi,  V2, v„  in  the  begin-picture  at  key  features  of  that  picture. 

Position  n corresponding  vertex  points  W2, w„  in  the  end-picture  at  the  corresponding  key 
features  of  that  picture. 

Triangulate  the  begin-  and  end-pictures  in  similar  ways  by  drawing  lines  between  corresponding 
vertex  points  in  both  pictures. 

For  any  time  t between  0 and  1 , find  the  vertex  points  (£)  , 112  (£) , . . u„  (t)  in  the  morph  picture  at 
that  time,  using  the  formula 

Uj-(0  = (1  -0vj  i=  (9) 

Triangulate  the  morph  picture  at  time  t similar  to  the  begin-  and  end-picture  triangulations. 

For  any  point  u in  the  morph  picture  at  time  t,  find  the  triangle  in  the  triangulation  of  the  morph 
picture  in  which  it  lies  and  the  vertices  ujV),  u j(t),  and  ugV)  of  that  triangle.  (See  Exercise  1 to 
determine  whether  a given  point  lies  in  a given  triangle.) 

Express  u as  a convex  combination  of  uj(£),  uj(£),  and  uj^V)  by  finding  the  constants  cj,  cj,  and 
c £ such  that 

u = cju/(0  + cjuj(t)  +Cjeu*r(0  (10) 

and 

cj  + cj+c  £-=1  (11) 

Determine  the  locations  of  the  point  u in  the  begin-  and  end-pictures  using 

v = c/v/  + cj\j  + (in  the  begin-picture)  (12) 

and 

w = cjwj  + c fwj  + (in  the  end-picture)  (13) 

Finally,  determine  the  picture-density  pf(u)  of  the  morph-picture  at  the  point  u using 

Pf(u)  = (1  - Opo(v)  + tp\  (w)  (14) 

Step  9 is  the  key  step  in  distinguishing  a warp  from  a morph.  Equation  14  takes  weighted  averages  of  the  gray 
levels  of  the  begin-  and  end-pictures  to  produce  the  gray  levels  of  the  morph-picture.  The  weights  depend  on 
the  fraction  of  the  distances  that  the  vertex  points  have  moved  from  their  beginning  positions  to  their  ending 
positions.  For  example,  if  the  vertex  points  have  moved  one-fourth  of  the  way  to  their  destinations  (i.e.,  if 
t = 0.25),  then  we  use  one-fourth  of  the  gray  levels  of  the  end-picture  and  three- fourths  of  the  gray  levels  of 


the  begin-picture.  Thus,  as  time  progresses,  not  only  does  the  shape  of  the  begin-picture  gradually  change  into 
the  shape  of  the  end-picture  (as  in  a warp)  but  the  gray  levels  of  the  begin-picture  also  gradually  change  into 
the  gray  levels  of  the  end-picture. 


Time  = 1 

End- picture 

Given  density:  p,(w) 


Time  = t 
Morph-picture 
Computed  density: 

Pi ii)  = 0 - Oflo*v> + *Pdw) 


Time  = 0 
Begin-picture 
Given  density:  p 0(y ) 


The  procedure  described  above  to  generate  a morph  is  cumbersome  to  perform  by  hand,  but  it  is  the  kind  of 
dull,  repetitive  procedure  at  which  computers  excel.  A successful  morph  demands  good  preparation  and 
requires  more  artistic  ability  than  mathematical  ability.  (The  software  designer  is  required  to  have  the 
mathematical  ability.)  The  two  photographs  to  be  morphed  should  be  carefully  chosen  so  that  they  have 
matching  features,  and  the  vertex  points  in  the  two  photographs  also  should  be  carefully  chosen  so  that  the 
triangles  in  the  two  resulting  triangulations  contain  similar  features  of  the  two  pictures.  When  the  procedure  is 
done  correctly,  each  frame  of  the  morph  should  look  just  as  “real”  as  the  begin-  and  end-pictures. 

The  techniques  we  have  discussed  in  this  section  can  be  generalized  in  numerous  ways  to  produce  much  more 
elaborate  warps  and  morphs.  For  example: 

If  the  pictures  are  in  color,  the  three  components  of  the  picture  colors  (red,  green,  and  blue)  can  be 
morphed  separately  to  produce  a color  morph. 

Rather  than  following  straight-line  paths  to  their  destinations,  the  vertices  of  a triangulation  can  be  directed 
separately  along  more  complicated  paths  to  produce  a variety  of  results. 

Rather  than  travel  with  constant  speeds  along  their  paths,  the  vertices  of  a triangulation  can  be  directed  to 
have  different  speeds  at  different  times.  For  example,  in  a morph  between  two  faces,  the  hairline  can  be 
made  to  change  first,  then  the  nose,  and  so  forth. 

Similarly,  the  gray-level  mixing  of  the  begin-picture  and  end-picture  at  different  times  and  different 
vertices  can  be  varied  in  a more  complicated  way  than  that  in  Equation  14. 

One  can  morph  two  surfaces  in  three-dimensional  space  (representing  two  complete  heads,  for  example) 
by  triangulating  the  surfaces  and  using  the  techniques  in  this  section. 


One  can  morph  two  solids  in  three-dimensional  space  (for  example,  two  three-dimensional  tomographs  of 
a beating  human  heart  at  two  different  times)  by  dividing  the  two  solids  into  corresponding  tetrahedral 
regions. 

Two  film  strips  can  be  morphed  frame  by  frame  by  different  amounts  between  each  pair  of  frames  to 
produce  a morphed  film  strip  in  which,  say,  an  actor  walking  along  a set  is  gradually  morphed  into  an  ape 
walking  along  the  set. 

Instead  of  using  straight  lines  to  triangulate  two  pictures  to  be  morphed,  more  complicated  curves,  such  as 
spline  curves,  can  be  matched  between  the  two  pictures. 

Three  or  more  pictures  can  be  morphed  together  by  generalizing  the  formulas  given  in  this  section. 

These  and  other  generalizations  have  made  warping  and  morphing  two  of  the  most  active  areas  in  computer 
graphics. 


Exercise  Set  10.20 

1.  Determine  whether  the  vector  v is  a convex  combination  of  the  vectors  vj,  V2,  and  V3.  Do  this  by  solving 
Equations  1 and  3 for  c\,  c 2,  and  C3  and  ascertaining  whether  these  coefficients  are  nonnegative. 

W v = 

(b) v  = 

(c)  v = 

(d)v  = 

Answer: 

(a)  Yes;  v = jvi  + jV2  + JV3 

(b)  No;  v = -jvi  + ^-V2  — ^3 

(c)  Yes;  v = yvi  + -jV2  4-  OV3 

(d)  Yes;  v = yjVi  + y^-v2  + -jjv3 

2.  Verify  Equation  7 for  the  two  triangulations  given  in  Figure  10.20.4. 

Answer: 

m = number  of  triangles  =7 ,n  = number  of  vertex  points  = 7,  £ = number  of  boundary  vertex  points 
= 5;  Equation  7)  is  7 = 2(7)  — 2 — 5. 

3.  Let  an  affine  transformation  be  given  by  a 2 x 2 matrix  M and  a two-dimensional  vector  b.  Let 

v = civi  + C2V2  + C3V3,  where  ci  +C2  + C3  = 1;  let  w=  Afv+  b;  and  let  Wj  = Mvj  + b for  i=  1,  2,  3. 
Show  that  w=  cjwj  + C2W2  + C3W3.  (This  shows  that  an  affine  transformation  maps  a convex 
combination  of  vectors  to  the  same  convex  combination  of  the  images  of  the  vectors.) 


vi  = 


V2  = 


. v3  = 


vi  = 


v2  — 


, v3  = 


vi  = 


v2  = 


v3  = 


vi  = 


v2  — 


v3  = 


Answer: 


w=  Mv  4 b = M(c\x\  +C2v2  + C3V3)  4 (ci  +c  2 + C3)b 
= c\(Mx\  4 b)  +C2(Afv2  + b)  +C3(il/v3  4 b)  =ciwi  4 C2W2  4C3W3 

(a)  Exhibit  a triangulation  of  the  points  in  Figure  10.20.4  in  which  the  points  V3,  v_j,  and  v$  form  the 
vertices  of  a single  triangle. 

(b)  Exhibit  a triangulation  of  the  points  in  Figure  10.20.4  in  which  the  points  V2,  v_j,  and  vj  do  not  form 
the  vertices  of  a single  triangle. 

Answer: 

(a)  V' 

v3 


(b) 


V6 

V1 


V7 


v2 


V1 

V5 

V6  V7 

5.  Find  the  2x2  matrix  M and  two-dimensional  vector  b that  define  the  affine  transformation  that  maps  the 
three  vectors  v\ , V2?  and  V3  to  the  three  vectors  ? W2,  and  W3.  Do  this  by  setting  up  a system  of  six 
linear  equations  for  the  four  entries  of  the  matrix  M and  the  two  entries  of  the  vector  b. 


1 

2 

2 

4 

9 

5 

1 

. v2  = 

3 

. v3  = 

1 
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3 

, w2  = 
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3 

(b) 

vi  = 
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00  Csl 

1 

1  
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1 
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1  

. v3  = 

2 
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5 

4 

(c) 

vi  = 

-2 

1 

- v2  = 

3 

5 

- v3  = 

1 

0 
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O CS1 
1 
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3 

(d) 

VI  = 

1 1 
O Csl 

'2 
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2 
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v3  = 

1 1 

cm 

,W1  = 

5' 

2 

, w2  = 

1 1 

00  to  l-o 

, W3  = 

7 

‘2 

-9 

Answer: 


(a)  Let  a and  b be  linearly  independent  vectors  in  the  plane.  Show  that  if  c\  and  Q are  nonnegative 
numbers  such  that  cj  + C2  = 1,  then  the  vector  + 02b  lies  on  the  line  segment  connecting  the  tips 
of  the  vectors  a and  b. 

(b)  Let  a and  b be  linearly  independent  vectors  in  the  plane.  Show  that  if  c\  and  C2  are  nonnegative 
numbers  such  that  c\  + cj  < 1,  then  the  vector  c 1 a + 02b  lies  in  the  triangle  connecting  the  origin 
and  the  tips  of  the  vectors  a and  b.  [Hint:  First  examine  the  vector  c 1 a 4-  02b  multiplied  by  the  scale 
factor  1 / (ci  +^2)-] 

(c)  Let  vj,  V2,  and  V3  be  noncollinear  points  in  the  plane.  Show  that  if  C\,  c 2,  and  c 3 are  nonnegative 
numbers  such  that  c\  + C2  + C3  = 1,  then  the  vector  cjvi  + C2V2  + C3V3  lies  in  the  triangle 
connecting  the  tips  of  the  three  vectors.  [Hint:  Let  a = vi  — V3  and  b = V2  — V3,  and  then  use 
Equation  1 and  part  (b)  of  this  exercise.] 

(a)  What  can  you  say  about  the  coefficients  c 1 , C72,  and  c 3 that  determine  a convex  combination 

v = c^vj  + C2V2  + C3V3  if  v lies  on  one  of  the  three  vertices  of  the  triangle  determined  by  the  three 
vectors  vj , V2,  and  V3? 

(b)  What  can  you  say  about  the  coefficients  c\,c 2,  and  c 3 that  determine  a convex  combination 

v = cjvi  + C2V2  + C3V3  if  v lies  on  one  of  the  three  sides  of  the  triangle  determined  by  the  three 
vectors  vj,  V2,  and  V3? 

(c)  What  can  you  say  about  the  coefficients  ^1,^2,  and  c 3 that  determine  a convex  combination 

v = ci  vi  + C2V2  + C3V3  if  v lies  in  the  interior  of  the  triangle  determined  by  the  three  vectors  vi,V2, 
and  V3? 

Answer: 

(a)  Two  of  the  coefficients  are  zero. 

(b)  At  least  one  of  the  coefficients  is  zero. 

(c)  None  of  the  coefficients  are  zero. 

(a)  The  centroid  of  a triangle  lies  on  the  line  segment  connecting  any  one  of  the  three  vertices  of  the 
triangle  with  the  midpoint  of  the  opposite  side.  Its  location  on  this  line  segment  is  two-thirds  of  the 
distance  from  the  vertex.  If  the  three  vertices  are  given  by  the  vectors  vi,  V2,  and  V3,  write  the 
centroid  as  a convex  combination  of  these  three  vectors. 

(b)  Use  your  result  in  part  (a)  to  find  the  vector  defining  the  centroid  of  the  triangle  with  the  three  vertices 


Answer: 


(a)  iVl  + 1-V2  + 4v3 


(b) 


3 1 3 

8/3" 

2 


Technology  Exercises 


The  following  exercises  are  designed  to  be  solved  using  a technology  utility.  Typically,  this  will  be 
MATLAB,  Mathematica,  Maple,  Derive,  or  Mathcad,  but  it  may  also  be  some  other  type  of  linear  algebra 
software  or  a scientific  calculator  with  some  linear  algebra  capabilities.  For  each  exercise  you  will  need  to 
read  the  relevant  documentation  for  the  particular  utility  you  are  using.  The  goal  of  these  exercises  is  to 
provide  you  with  a basic  proficiency  with  your  technology  utility.  Once  you  have  mastered  the  techniques  in 
these  exercises,  you  will  be  able  to  use  your  technology  utility  to  solve  many  of  the  problems  in  the  regular 
exercise  sets. 


Tl.  To  warp  or  morph  a surface  in  R~‘  we  must  be  able  to  triangulate  the  surface.  Let  vj  = 


vii 

v12 

v13 


V21 

V31 

"vf 

v2  = 

v22 

, and  V3  = 

v32 

be  three  noncollinear  vectors  on  the  surface.  Then  a vector  v = 

v2 

v23 

V33 

v3 

lies  in  the 


triangle  formed  by  these  three  vectors  if  and  only  if  v is  a convex  combination  of  the  three  vectors;  that  is, 
v = c i vi  + C2V2  + C3V3  for  some  nonnegative  coefficients  c\,  c 2,  and  c 3 whose  sum  is  1. 

(a)  Show  that  in  this  case,  c j , C2,  and  c 3 are  solutions  of  the  following  linear  system: 

"vii  v2i  v3i 
v12  v22  V32 
V13  V23  V33 

1 1 1 


"vf 

r*f 

v2 

^2 

|_c3_ 

v3 

1 

In  parts  (b)-(d)  determine  whether  the  vector  v is  a convex  combination  of  the  vectors  vi  = 


(b) 


(c) 


3" 

2" 

v2  = 

0 

, and  V3  = 

2 

9 

-4 

'9' 

V = I 

9 

4 

9 

2 

7 

-5 


10 

9 

9 


(d)  1 r 13  ■ 

V=T  -7 


T2.  To  warp  or  morph  a solid  object  in  R1'  we  first  partition  the  object  into  disjoint  tetrahedrons.  Let 

>11]  |>2l]  |>3l]  |>4r 

vi  = v12  , V2  = v22  , V3  = v32  ,andV4=  v42  be  four  noncoplanar  vectors.  Then  a vector 

V13J  |_v23  J [V33J  LV43 

>l] 

v = v2  lies  in  the  solid  tetrahedron  formed  by  these  four  vectors  if  and  only  if  v is  a convex  combination  of 

_V3_ 

the  three  vectors;  that  is,  v = civj  + C2V2  4-  C3V3  + C4V4  for  some  nonnegative  coefficients  c 1 . c 2,  -3.  and 
c 4 whose  sum  is  one. 


(a)  Show  that  in  this  case,  c\ , C2,  C3,  and  c 4 are  solutions  of  the  following  linear  system: 

>11  V21  v31  v4i  ] |>1  ] [vf 

v12  v22  v32  v42  c2  _ v2 

v13  v23  v33  v43  c3  ~ v3 

1 1 1 1 c4  1 


In  parts  (b)-(d)  determine  whether  the  vector  v is  a convex  combination  of  the  vectors  vi 

”31  [71  r_i 

v2  = 4 , V3  = 2 , and  V4  = 3 . 

2J  L3J  L 2 

(b)  [5" 

v = 0 

7 

(c)  fl" 

v=  1 

2 

(d)  rr 

v = 2 
2 
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| APPENDIX 


How  to  Read  Theorems 

Since  many  of  the  most  important  concepts  in  linear  algebra  occur  as  theorem  statements,  it  is  important  to  be 
familiar  with  the  various  ways  in  which  theorems  can  be  structured.  This  appendix  will  help  you  to  do  that. 


Contrapositive  Form  of  a Theorem 

The  simplest  theorems  are  of  the  form 


If  His  true,  then  C is  true.  (1) 

where  His  a statement,  called  the  hypothesis,  and  C is  a statement,  called  the  conclusion.  The  theorem  is  true 
if  the  conclusion  is  true  whenever  the  hypothesis  is  true,  and  the  theorem  is  false  if  there  is  some  case  where 
the  hypothesis  is  true  but  the  conclusion  is  false.  It  is  common  to  denote  a theorem  of  form  1 as 

H^C  (2) 

(read,  “//implies  C”).  As  an  example,  the  theorem 

If  a and  b are  both  positive  numbers,  then  ab  is  a positive  number.  (3) 


is  of  form  2,  where 


H = a and  b are  both  positive  numbers 


(4) 


C = ab  \s  a positive  number  (5) 

Sometimes  it  is  desirable  to  phrase  theorems  in  a negative  way.  For  example,  the  theorem  in  3 can  be 
rephrased  equivalently  as 

If  ab  is  not  a positive  number,  then  a and  b are  not  both  positive  numbers.  (6) 

If  we  write  H to  mean  that  4 is  false  and  ^ C t0  mean  that  5 is  false,  then  the  structure  of  the  theorem  in  6 
is 


~ C =£■  ~ H 


(7) 


In  general,  any  theorem  of  form  2 can  be  rephrased  in  form  7,  which  is  called  the  contrapositive  of  2.  If  a 
theorem  is  true,  then  so  is  its  contrapositive,  and  vice  versa. 


Converse  of  a Theorem 

The  converse  of  a theorem  is  the  statement  that  results  when  the  hypothesis  and  conclusion  are  interchanged. 
Thus,  the  converse  of  the  theorem  t{  > Q is  the  statement  Q : //.  Whereas  the  contrapositive  of  a true 
theorem  must  itself  be  a true  theorem,  the  converse  of  a true  theorem  may  or  may  not  be  true.  For  example, 
the  converse  of  3 is  the  false  statement 

If  ab  is  a positive  number,  then  a and  b are  both  positive  numbers. 
but  the  converse  of  the  true  theorem 


If  a >b,  then  2 a >2 b . (8) 

is  the  true  theorem 

If  2a  > 2b,  then  a > b . (9) 


Equivalent  Statements 

If  a theorem  H > Q and  its  converse  Q > //  are  both  true,  then  we  say  that  H and  C are  equivalent 
statements,  which  we  denote  by  writing 


HoC  (10) 

(read,  “H  and  C are  equivalent”).  There  are  various  ways  of  phrasing  equivalent  statements  as  a single 
theorem.  Here  are  three  ways  in  which  8 and  9 can  be  combined  into  a single  theorem. 


Form  1 


If  a >b, then  2a>2b,  and  conversely,  if  2a  > 2b,  then  a>b- 


Form  2 


a>b  if  and  only  if  2a>2b- 

J 


n 


Form  3 

The  following  statements  are  equivalent. 

(i)  «>£ 

2 h 


j 


Theorems  Involving  Three  or  More  Statements 


Sometimes  two  true  theorems  will  give  you  a third  true  theorem  for  free.  Specifically,  if  H > C >s  a true 
theorem,  and  C > Z)  is  a true  theorem,  then  H :•  [}  must  also  be  a true  theorem.  For  example,  the  theorems 
If  opposite  sides  of  a quadrilateral  are  parallel,  then  the  quadrilateral  is  a parallelo  gram. 
and 

Opposite  sides  of  a parallelogram  have  equal  lengths. 

imply  the  third  theorem 

If  opposite  sides  of  a quadrilateral  are  parallel,  then  they  have  equal  lengths. 

Sometimes  three  theorems  yield  equivalent  statements  for  free.  For  example,  if 


HoC,  CoD,  D=*H 


(11) 


then  we  have  the  implication  loop  in  Figure  A.l  from  which  we  can  conclude  that 


C^H,  D=*C,  H=*D 


(12) 


Combining  this  with  1 1 we  obtain 


HoC,  CoD,  Do  H 


(13) 


In  summary,  if  you  want  to  prove  the  three  equivalences  in  13,  you  need  only  prove  the  three  implications  in 
11. 


H 


/\ 

D <= 

Figure  A.l 
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| APPENDIX 


Complex  Numbers 


Complex  numbers  arise  naturally  in  the  course  of  solving  polynomial  equations.  For  example,  the  solutions  of 
the  quadratic  equation  ax~  I bx  I c = 0,  which  are  given  by  the  quadratic  formula 

-i>±  l/&2- 4ac 
X 2 a 

are  complex  numbers  if  the  expression  inside  the  radical  is  negative.  In  this  appendix  we  will  review  some  of 
the  basic  ideas  about  complex  numbers  that  are  used  in  this  text. 


Complex  Numbers 

To  deal  with  the  problem  that  the  equation  = _ 1 has  no  real  solutions,  mathematicians  of  the  eighteenth 
century  invented  the  “imaginary”  number 

i = 

which  is  assumed  to  have  the  property 

,2=(/ZT)2=-  1 

but  which  otherwise  has  the  algebraic  properties  of  a real  number.  An  expression  of  the  form 

a + bi  or  a + ib 

in  which  a and  b are  real  numbers  is  called  a complex  number.  Sometimes  it  will  be  convenient  to  use  a 
single  letter,  typically  z,  to  denote  a complex  number,  in  which  case  we  write 

z = a 4-  bi  or  z = a + ib 

The  number  a is  called  the  real  part  of  z and  is  denoted  by  Re(z) , and  the  number  b is  called  the  imaginary 
part  ofz  and  is  denoted  by  Im(z).  Thus, 

Re(3  + 2i)  = 3,  Im(3  + 2i)  = 2 

Re(l  -5i)  = 1,  Im(l  — 5i)  = Im(l  + ( — 5)i)  = -5 

Re  (7j)  = Re  (0  + 7i)  = 0,  Im(7i)=7 

Re  (4)  = 4,  Im(4)  =Im(4  + 0?)  = 0 

Two  complex  numbers  are  considered  equal  if  and  only  if  their  real  parts  are  equal  and  their  imaginary  parts 
are  equal;  that  is, 

a + bi  = c + di  if  and  only  if  a=c  and  b = d 

A complex  number  z = bi  whose  real  part  is  zero  is  said  to  be  pure  imaginary.  A complex  number  z = a 
whose  imaginary  part  is  zero  is  a real  number,  so  the  real  numbers  can  be  viewed  as  a subset  of  the  complex 


numbers. 


Complex  numbers  are  added,  subtracted,  and  multiplied  in  accordance  with  the  standard  rules  of  algebra  but 
with  j ^ = _ 1 : 


(<2  4-  bi)  + {c  + di)  = {a  + c)  + (b  + d)i  (1) 

(i a + bi ) — (c  + di)  = (a  —c)  + (b  — d)i  (2) 

(i a + i>j)  (c  + di)  = (ac  — bd)  + {ad  + bc)i  (3) 

The  multiplication  formula  is  obtained  by  expanding  the  left  side  and  using  the  fact  that  j2  = — \ . Also  note 
that  if  t>  = Cb  then  the  multiplication  formula  simplifies  to 

a (c  + di)  = ac  4-  ad i (4) 


The  set  of  complex  numbers  with  these  operations  is  commonly  denoted  by  the  symbol  C and  is  called  the 
complex  number  system. 


EXAMPLE  1 Multiplying  Complex  Numbers 

As  a practical  matter,  it  is  usually  more  convenient  to  compute  products  of  complex  numbers  by 
expansion,  rather  than  substituting  in  3.  For  example, 

(3  - 2j)(4  + S)  = 12  + 15j  - Si  - \0i2  = (12  + 10)  +7i  = 22  + li 


The  Complex  Plane 

A complex  number  z = a \ bi  can  be  associated  with  the  ordered  pair  ( a , b ) of  real  numbers  and  represented 
geometrically  by  a point  or  a vector  in  the  xy-plane  (Figure  B.l).  We  call  this  the  complex  plane.  Points  on 
the  v-axis  have  an  imaginary  part  of  zero  and  hence  correspond  to  real  numbers,  whereas  points  on  the  y-axis 
have  a real  part  of  zero  and  correspond  to  pure  imaginary  numbers.  Accordingly,  we  call  the  x-axis  the  real 
axis  and  the  y-axis  the  imaginary  axis  (Figure  B.2). 


a + bi 

”’l 


x 


Figure  B.l 


(Imaginary  b 
part  of  z) 


t J magi  nary  axis 

Z = <7  + bi 


I 

I 

I 

I Real  axis 


(Real  part  of  z) 

Figure  B.2 


Complex  numbers  can  be  added,  subtracted,  or  multiplied  by  real  numbers  geometrically  by  performing  these 
operations  on  their  associated  vectors  (Figure  B.3,  for  example).  In  this  sense  the  complex  number  system  C 
is  closely  related  to  £2,  the  main  difference  being  that  complex  numbers  can  be  multiplied  to  produce  other 
complex  numbers,  whereas  there  is  no  multiplication  operation  on  r}  that  produces  other  vectors  in  R1  (the 
dot  product  produces  a scalar,  not  a vector  in  r}). 


k)' 


42 


- 7 

/ 

/ 

/ 

/ 

/ 


JC 

► 


ay 


-1 


v2 


The  sum  of  two  complex 

The  difference  of  two 

numbers 

complex  numbers 

Figure  B.3 

If  z = a | bi  is  a complex  number,  then  the  complex  conjugate  of  z,  or  more  simply,  the  conjugate  of  z,  is 
denoted  by  z (read,  “z  bar”)  and  is  defined  by 

z = a — bi  (5) 

Numerically,  z is  obtained  from  z by  reversing  the  sign  of  the  imaginary  part,  and  geometrically  it  is  obtained 
by  reflecting  the  vector  for  z about  the  real  axis  (Figure  B.4). 


Figure  B.4 


EXAMPLE  2 Some  Complex  Conjugates 


z = 3 + Ai 
z = — 2 — 5 i 
z = i 
z = 7 


z=  3 — 4i 
z=  — 2 + 5i 
z=  — i 
z = 7 


The  last  computation  in  this  example  illustrates  the  fact  that  a real  number  is  equal  to  its  complex 
conjugate.  More  generally,  z = z if  and  only  if  z is  a real  number. 


The  following  computation  shows  that  the  product  of  a complex  number  z = a \ bi  and  its  conjugate 
z = a — bi  is  a nonnegative  real  number: 


zz=  (a  + bi)  (a  — bi)  = a1  — abi  + bai  — b^i^  = a^  + b^ 


(6) 


You  will  recognize  that 

is  the  length  of  the  vector  corresponding  to  z (Figure  B.5);  we  call  this  length  the  modulus  (or  absolute  value 
of  z)  and  denote  it  by  |z|.  Thus, 


Note  that  if  b = 0>  then  z = a is  a real  number  and 


|z|  = \j~zz  = ija2  + b2 

H=/?= 


(7) 


which  tells  us  that  the  modulus  of  a real 


number  is  the  same  as  its  absolute  value  as  defined  in  beginning  algebra. 


a 


a 


z = a + hi 

i 

I 

\b 

I 


Figure  B.5 


EXAMPLE  3 Some  Modulus  Computations 


z = 3 + 4i 
z = — 4 — 5i 
z = i 


H 

H 

H 


^32  +42  = 5 

l/(-4)2  + (-5)2  = /4T 

l/o2+  12  = 1 


Reciprocals  and  Division 


If  z -t  0?  then  the  reciprocal  (or  multiplicative  inverse ) of  z is  denoted  by  ] / ^ (or  ^ ) and  is  defined  by  the 

property 

(?>=’ 

This  equation  has  a unique  solution  for  ] / z-  which  we  can  obtain  by  multiplying  both  sides  by  z and  using 
the  fact  that  zz  = |z|2  [see  7].  This  yields 


1 

z 


(8) 


If  Z2  * 0,  then  the  quotient  z\  / Z2  is  defined  to  be  the  product  of  z\  and  1 / Z2-  This  yields  the  formula 


£L  _ 12  _ *1^2 

M2  M2 


(9) 


Observe  that  the  expression  on  the  right  side  of  9 results  if  the  numerator  and  denominator  of  zj  /z2  are 
multiplied  by  Z2-  As  a practical  matter,  this  is  often  the  best  way  to  perform  divisions  of  complex  numbers. 


EXAMPLE  4 Division  of  Complex  Numbers 


Let  z\  = 3 + 4z  and  Z2  = 1 — 2L  Express  zj  / ^2  in  the  form  a -)- 


We  will  multiply  the  numerator  and  denominator  of  z\  / by  z^.  This  yields 

£L  _ 2i£2  _ 3 + 4i  1 + 2i 

z2  z2Z2  1-2  i 1 + 2i 

_ 3 -f  6i  -f  4i  4-  8 i2 
1 — 4t2 
_ -5-f  10; 

5 

— — 1 + 2i 


The  following  theorems  list  some  useful  properties  of  the  modulus  and  conjugate  operations. 


THEOREM  B.1 

The  following  results  hold  for  any  complex  numbers  z,  z\,  and  ?2- 

(a)  ^1  Tz2=z\  +Z2 

(b)  2i  -z2  =zi  -Z2 

(c)  W2=z\Z2 

(d)  z\  iz2  =z\  Iz2 

(e)  z=z 


THEOREM  B.2 

The  following  results  hold  for  any  complex  numbers  z,  z\,  and  ?2- 

(a) ¥ 1 = 1*1 

(b)  |2i*2|  = |2i||22| 

(c)  |21  lz2\  = |zi|/  |Z2| 

(d)  l^l+^2|<|2l|+|22| 


Polar  Form  of  a Complex  Number 


If  z = a | bi  is  a nonzero  complex  number,  and  if  is  an  angle  from  the  real  axis  to  the  vector  z,  then,  as 
suggested  in  Figure  B.6,  the  real  and  imaginary  parts  of  z can  be  expressed  as 

a = |z|cos  6 and  £>  = |z|sin<A  (10) 

Thus,  the  complex  number  z = a+bi  can  be  expressed  as 

z=  |z|(cos  ^ + i sinri)  (11) 

which  is  called  a polar  form  of  z.  The  angle  (p  in  this  formula  is  called  an  argument  of  z.  The  argument  of  z is 
not  unique  because  we  can  add  or  subtract  any  multiple  of  2k  to  it  to  obtain  a different  argument  of  z. 
However,  there  is  only  one  argument  whose  radian  measure  satisfies 

— K <<j><K  (12) 


This  is  called  the  principal  argument  of  z. 

t \ 

(a,  b ) 

I 

W I b = |;|  sin  </> 

I . 

a - | cj  cos  <b 

Figure  B.6 


EXAMPLE  5 Polar  Form  of  a Complex  Number 

Express  z = 1 — \[?>i  in  polar  form  using  the  principal  argument. 

The  modulus  of  z is 

z = /l2  + ( - /3)2  = {A  = 2 

Thus,  it  follows  from  10  with  a = 1 and  b = — \j 3 that 

1 = 2 cos  <b  and  — ^3  — 2 sin 

and  this  implies  that 

cos$=-^  and  sin$=  — ^ 

The  unique  angle  6 that  satisfies  these  equations  and  whose  radian  measure  satisfies  12  is 
(f>  = —k13  (Figure  B.7).  Thus,  a polar  form  of  z is 

z = 2^cos^—  + i sin  ^ — "3  ) ) = 2^cos-|  — i sin-|J 


Figure  B.7 


Geometric  Interpretation  of  Multiplication  and  Division  of  Complex 
Numbers 

We  now  show  how  polar  forms  of  complex  numbers  provide  geometric  interpretations  of  multiplication  and 
division.  Let 

zi  = |z^i|(cos -hi  sin<^i)  and  = |z2|(cos 62  + J sin $2) 
be  polar  forms  of  the  nonzero  complex  numbers  z\  and  z2-  Multiplying,  we  obtain 

z\Z2  — |zi||z2|[(cos  <£icos  62  — sin^isin^)  + i(sin<$icos  $2  4-  cos  ^>1  sin  02) ] 

Now  applying  the  trigonometric  identities 

cos(^i  +^2)  = cos  <£icos  d>2  ~ sin<^isin^2 
sin(<£i  +$2)  = sin«^icos  62  + cos  <£isin$2 

yields 


?iz2=  |zi||z2|[cos(^i  + d>2)  +*  sin(^i  + d>2)]  (13) 

which  is  a polar  form  of  the  complex  number  with  modulus  Jzj  ||Z2|  and  argument  p\  + <p2-  Thus,  we  have 
shown  that  multiplying  two  complex  numbers  has  the  geometric  effect  of  multiplying  their  moduli  and  adding 
their  arguments  (Figure  B.8). 


Figure  B.8 


Similar  kinds  of  computations  show  that 


§L  = |i|[cos»i  -&)  +.  an(*,  -«] 


(14) 


which  tells  us  that  dividing  complex  numbers  has  the  geometric  effect  of  dividing  their  moduli  and  subtracting 
their  arguments  (both  in  the  appropriate  order). 


EXAMPLE  6 Multiplying  and  Dividing  in  Polar  Form 

Use  polar  forms  of  the  complex  numbers  z\  = 1 + and  Z2  = ^3  + i to  compute  z\z2  and 

z\  iz2- 

Polar  forms  of  these  complex  numbers  are 

z\  = 2^cosj  + i siny  j and  Z2  = 2^cos-^ + i sin^-J 

(verify).  Thus,  it  follows  from  13  that 

r,Z2=4[coS(|  + |)  + isin(|  + |)]=4[cos(|)  + isin(|)]=4i 

and  from  14  that 

As  a check,  let  us  calculate  z\Z2  and  z\  / Z2  directly: 

z\z2  = (1  + /3j)(/3 + 0 = ][3  + i + 3i  + fei2  = 4i 

z,  _ 1 + /I it  _ 1 + ^3?  |/3-i  _ j/3-i  + 3i-  i/Ii2  _ 2^3  + 2 i _ /I  i . 

22  /3  + j /3  + i /3-i  3-i2  4 2 21 

which  agrees  with  the  results  obtained  using  polar  forms. 


The  complex  number  i has  a modulus  of  1 and  a principal  argument  of  jr  / 2-  Thus,  if  z is  a complex 
number,  then  \z  has  the  same  modulus  as  z but  its  argument  is  greater  by  jr  / 2(  = 90°);  that  is,  multiplication 
by  i has  the  geometric  effect  of  rotating  the  vector  z counterclockwise  by  90°  (Figure  B.9). 


Figure  B.9 


DeMoivre's  Formula 


If  n is  a positive  integer,  and  if  z is  a nonzero  complex  number  with  polar  form 

z = |z|(cos  <$>  + i sin 

then  raising  z to  the  nth  power  yields 

zn  =z  mz  • • • • • z = |z|”  [cos($  + 0+  • • • +0)]  +i[sin(0  + 0+  • • • +$)] 

it  factors  It  terms  n terms 


which  we  can  write  more  succinctly  as 


zn  — |z|”(cos  «<$  + i sin«6) 


In  the  special  case  where  |z|  = 1 this  formula  simplifies  to 

zn  — cos  n<b  + i sin  n6 

which,  using  the  polar  form  for  z,  becomes 

(cos  + i sin  <j>)n  = cos  n6  4-  i sin  n& 


(15) 


(16) 


This  result  is  called  DeMoivre's  formula. 


Euler's  Formula 

If  0 is  a real  number,  say  the  radian  measure  of  some  angle,  then  the  complex  exponential  function  su’  is 
defined  to  be 


iO 

e =cos0  + isin0 


(17) 


which  is  sometimes  called  Euler's  formula.  One  motivation  for  this  formula  comes  from  the  Maclaurin  series 
in  calculus.  Readers  who  have  studied  infinite  series  in  calculus  can  deduce  17  by  formally  substituting  \0  for 
x in  the  Maclaurin  series  for  gx  and  writing 


je 


= 1 + id  + 


(iff)1  , mi  ml  , (id) 


2! 


3! 


+ 


4! 


5! 


+ 


ml 

6! 


+ ... 


ez  e5  , e 4 , e- 


0C 


1 't  Z0  B 7 — 1 — — 1 — 7 — J — 

^ 2!  3!  4!  5!  6! 


0l  , 0 


0C 


= M_2_  + 2 2_  + 

2!  4!  6\. 


= cos  0 + i sin# 


where  the  last  step  follows  from  the  Maclaurin  series  for  cos  9 and  sin  Q. 


If  z = a + bi  is  any  complex  number,  then  the  complex  exponential  ez  is  defined  to  be 

ez  = ea+i>!  = eaelb  = ea{zos  b + j sin  b) 

It  can  be  proved  that  complex  exponentials  satisfy  the  standard  laws  of  exponents.  Thus,  for  example, 

E?zlez2  = *,zl+z2;  Zi  = ezi-z2>  J_  = 
ez 2 ez 


Copyright  © 2010  John  Wiley  & Sons,  Inc.  All  rights  reserved. 


Answer  to  Exercises 


Exercise  Set  1.1 

(a),  (c),  and  (f)  are  linear  equations;  (b),  (d)  and  (e)  are  not  linear  equations 
(a)  and  (d)  are  linear  systems;  (b)  and  (c)  are  not  linear  systems 
(a)  and  (d)  are  both  consistent 

(a),  (d),  and  (e)  are  solutions;  (b)  and  (c)  are  not  solutions 


y = t 


11. 


b. 

*1 

= 

}■-§ 

*2 

— 

r 

*3 

= 

s 

X4 

= 

* 

a. 

2x\ 

= 

0 

3*1 

-4*2  = 

0 

*2 

= 

1 

b. 

3*1 

— 2x3 

= 5 

7*1 

4= 

*2 

4=  4x3 

= -3 

-2x2 

4=  x3 

= 7 

c. 

7*i 

+ 

2x2 

+ x3  - 

m 

II 

CO 

*1 

+ 

2x2 

4-  4x3 

= 1 

d. 

*1 

= 7 

*2 

= -2 

x3  =3 

*4=4 


13. 


-2  6" 

3 8 

9 -3 

b.  [6  -1  3 4" 

0 5 -1  1_ 

020-310 
_3-l  ioo-l 
6 2-1  2-3  6 

d.  [1  0 0 0 -1  7] 


True/False  1.1 

True 
False 
True 
True 
False 
(f)  False 
True 
False 

Exercise  Set  1.2 

a.  Both 
Both 
Both 
Both 
Both 
Both 

Row  echelon 

a.  *1=  “37,  X2=  —8,  *3  = 5 

b.  xi  — 13 1 — 10,  *2  = 13£  — 5,  *3  = — 1 4=  2,  X4  = t 
Cm  x\=  —7s + 2^  — 11,  X2  = s,  X2=  —3^  — 4,  X4  = —3^  + 9,  x$  = t 


3. 


Inconsistent 

5.  x\  = 3,  *2=1,  *3  = 2 


7.  x = t—  1,  7 = 2s,  z = s,  vt i = t 
9.  *1=3,  *2=1,  *3  = 2 
11.  *=*  — 1,  y = 2s,  z = s,  w =t 
Has  nontrivial  solutions 
Has  nontrivial  solutions 
17.  *1  = 0,  *2  = 0,  *3  = 0 
19.  *1  = — s,  *2=  *3  = 4 s,  X4  = t 

21.  w = t,  x = —t,  y = t,  z — 0 
23.  /l  — — 1.  /2  = 0.  h=  1.  h = 2 

If  a = 4?  there  are  infinitely  many  solutions;  if  a = — 4?  there  are  no  solutions;  if  a ^ 4,  there  is  exactly  one  solution. 

If  a = 3,  there  are  infinitely  many  solutions;  if  a = —3,  there  are  no  solutions;  if  a * ± 3?  there  is  exactly  one  solution. 

29.  x = 2a_k  v = _ £ + 26 
3 9’  7 3^9 


31. 


0 ?]and 

"l  0" 

_0  1_ 

* = ± 1,  y 

= ±i! 

are  possible  answers. 


37.  (3  = 1,  2>  = —6,  c — 2,  ii  = 10 

The  nonhomogeneous  system  will  have  exactly  one  solution. 

True/False  1.2 

True 
False 
(c)  False 
True 
True 
(f)  False 
True 
False 
False 

Exercise  Set  1.3 

1*  a.  Undefined 
b 4x2 
c Undefined 
d Undefined 

e.  5x5 

f.  5x2 

g.  Undefined 

h 5x2 


b. 


d. 


7 

-2 

7 

-5 

0 

-1 

15 

-5 

5 


6 5 
1 3 
3 7 

4 

-1 

1 

0 

10 

5 


' _7  —28  -14] 
—21  -7  -35 J 


Undefined 


f. 


22  -6  8 

-2 

10 
-39 
9 


-33 


4 6 
0 4 
-21  -24 
-6  -15 
-12  -30 


h. 

o 

o 

0 0 

0 0 

5 

j- 

-25 

k. 

168 

Undefined 

a. 

" 12 

-3" 

-4 

5 

4 

1 

b.  Undefined 

c. 

"42  108 

75" 

12  ■ 

-3 

21 

36 

78 

63 

d. 

" 3 

45 

9" 

11  - 

-11 

17 

_ 7 

17 

13 

e. 

" 3 

45 

9" 

11  - 

-11 

17 

7 

17 

13 

f. 

"21  17" 

17  35_ 

g- 

' 0 - 

-2 

11" 

12 

1 

8_ 

h. 

"12 

6 

9" 

48  - 

-20 

14 

24 

8 

16 

61 

j.  35 

k.  28 
99 


7. 


a. 

b. 


c. 


d. 


f. 


[6741  41] 
[63  67  57] 

"41" 

21 

67 

' 6" 

6 

63_ 

’24  56  97] 

"76" 

98 

97 


a. 

-3 

3 

■2 

12 

3 

-2 

7 

76 

3 

-2 

7 

48 

= 3 

6 

+ 6 

5 

29 

= - 

-2 

6 

+ 5 

5 

+ 4 

4 

98 

= 7 

6 

+ 4 

5 

+ 9 

4 

24 

0 

4 

56 

0 

4 

9 

97 

0 

4 

9 

b. 

64 

6 

4 

14 

6 

-2 

4 

38 

6 

-2 

4 

21 

= 6 

0 

+ 7 

3 

22 

= 

-2 

0 

4= 

1 

+ 7 

3 

18 

= 4 

0 

4=  3 

1 

+ 5 

3 

77 

7 

5 

28 

7 

7 

5 

74 

7 

7 

5 

H-  a. 

"2  -3  5" 

■*r 

7" 

9 -1  1 

*2 

= 

-1 

1 5 4 

*3 

0 

"4 

0 

-3 

f 

"*l" 

T 

5 

1 

0 

-8 

*2 

3 

2 

-5 

9 

-1 

*3 

0 

0 

3 

-1 

7 

x4 

2 

13. 


5x\  4=  6x2  — 7*2 

—x\  — 2x2  + 3^3 

4x2  ~ *3 


2 

0 

3 


2 

2 

-9 


b. 

*1 

*2 

+ 

*3 

= 

2x\ 

+ 

3*2 

= 

5*1 

- 

3*2 

- 

6x3 

= 

-1 

II 

b = 

-6, 

c = 

-1 , d = 

1 

a. 

*11 

0 

0 

0 

0 

0 

0 

*22 

0 

0 

0 

0 

0 

0 

*33 

0 

0 

0 

0 

0 

0 

*44 

0 

0 

0 

0 

0 

0 

*55 

0 

0 

0 

0 

0 

0 

*66 

b. 

’*11 

*12 

*13 

*14 

*15 

*16 

0 

*22 

*23 

*24 

*25 

*26 

0 

0 

*33 

*34 

*35 

*36 

0 

0 

0 

*44 

*45 

*46 

0 

0 

0 

0 

*55 

*56 

0 

0 

0 

0 

0 

*66 

c. 

*11 

0 

0 

0 

0 

0 

*21 

*22 

0 

0 

0 

0 

*31 

*32 

*33 

0 

0 

0 

*41 

*42 

*43 

*44 

0 

0 

*51 

*52 

*53 

*54 

*55 

0 

*61 

*62 

*63 

*64 

*65 

*66 

d. 

*11 

*12 

0 

0 

0 

0 

*21 

*22 

*23 

0 

0 

0 

0 

*32 

*33 

*34 

0 

0 

0 

0 

*43 

*44 

*45 

0 

0 

0 

0 

*54 

*55 

*56 

0 

0 

0 

0 

*65 

*66 

,/*l\  /*l+*2\ 

7 r2/  l *2  j 


b. 


c. 


* /(x) 


!y 

I - 


/U).=  X , 
1 2 


I 


fix) 


27. 

'1  1 0" 

One;  namely,  A — 

1 -1  0 

0 0 0 

a'  \]  11 

and 

h "ii 

L-i  -ij 

b. 


Four; 


True/False  1.3 

l)  True 
False 
False 
l)  False 
) True 
False 
;)  False 
0 True 
True 
True 
True 
False 
True 
i)  True 
•)  False 

Exercise  Set  1 .4 


\f5  Ol 

\-f5  0 

{5  0 

-{5  0 

[o  3J 

[ 0 3_ 

. 0 ~3. 

0 —3 

5. 


1 J_ 

5 20 

5 10 

2 ° 

0 i 


he'+n 

icz+o 


15. 


4 = 


f 1 

1 1 
7 7 


17. 


19. 


_9_ 
" 13 


13 

_2_  __6_ 
13  13 


b. 


41  15 
30  11 
11  -15 
-30  41 

6 2 
4 2 


d.n  i] 

.2  -lj 

) 71 

1 6j 


1 


21. 


b. 


d. 


27. 


1 

*11 


39  13 

26  13 

'27  0 0 

0 26  -18 
0 18  26 

27  0 0 

0 0.026  0.018 
_ 0 -0.018  0.026 
~4  0 0 

0 —5  -12 
_0  12  -5 

'1  0 0^ 

0-3  3 

_0  -3  -3 
16  0 0 
0 -14  -15 
0 15  -14 

'25  0 0~ 

0 32  -24 
0 24  32 

0 ■ • • 0 

-L-  ■ ■ ■ 0 

*22 

0 ■ ■ ■ 

*«« 


L D=CA-1B-1A-2BC2{bT)j  1 


33.  B~ 
35. 


a-1. 


37. 


A~l  = 


ill 

“2  2 2 

1 _1  1 

2 2 2 

1 1 _! 

2 2 2 

II  1 

"2  2 2 

1 0 0 


39.  *1=-L  xo  = ll 


23’ 


x2  = 


23 


41.  x\  — ro  = A. 

Xl  ir  2 li 

True/False  1.4 


False 
False 
False 
False 
False 
True 
g)  True 
True 
False 
j)  True 
False 


Exercise  Set  1.5 

a.  Elementary 
Not  elementary 
c.  Not  elementary 


t'Ol'vJ 


Not  elementary 


Add  3 times  row  2 to  row  1 


J1  3 
v 


Multiply  row  1 by  — 


Add  5 times  row  1 to  row  3: 


0 0 

0 1 0 
0 0 1 
1 0 0 
0 1 0 
5 0 1 


Swap  rows  1 and  3: 


0 0 10 
0 10  0 
10  0 0 
0 0 0 1 


a r 3 -6 

Swap  rows  1 and  2:  S./4  = 

-6  -6~ 
5 -1_ 

b. 

2 -1 

0 

-4 

-4 

Add  —3  times  row  2 to  row  3:  EA  = 

1 -3 

-1 

5 

3 

-1  9 

4 

-12 

-10 

c.  [13  28' 

Add  4 times  row  3 to  row  1 : EA  = 


a. 

0 

0 

r 

0 

1 

0 

1 

0 

0 

b. 

0 

0 

r 

0 

1 

0 

1 

0 

0 

c. 

1 

0 0 

0 

1 0 

■2 

0 1 

d. 

'1 

0 

o' 

0 

1 

0 

2 

0 

1 

-7 

1 

n 

2 

-1 

J 

11.  2 3 

7 7 

3 1 
_7  7 

13.  [ 3 
2 

-1 

2 


li 

10  5 

1 1 
_7_  2 

10  5 


15.  No  inverse 


17. 


1 

2 

1 

2 

1 

2 


-1 

0 


_1  1 

2 2 

1 1 

2 2 

1 _1 

2 2 

0 -3 

1 0 
= 1 1 


19. 


21. 


23. 


25. 


29. 


1 

1 

; o 

4 

2 

1 

1 

3 

0 

8 

4 

2 

0 

0 

1 

0 

2 

1 

1 

1 

1 

40 

'20 

'10 

5 

7 

5 

5 

r 

12 

24 

8 

4 

5 

5 

1 

1 

6 

12 

4 

2 

5 

5 

5 

1 

12 

24 

8 

4 

1 

1 

1 

1 

12 

24 

8 

4 

a. 

1 

0 

C 

) 0 

*1 

0 

J_ 

0 0 

*2 

0 

0 

J 

- 0 

k 3 

0 

0 

C 

i -L 

£4 

b. 

'l 

.1 

0 

0 

k 

k 

0 

1 

0 

0 

c 

) 

0 

1 

_1 

k 

k 

0 

0 

0 

1 

^ * 0,  1 

-3 

1 

1 

0 

1 1 

-4  0 

2 

2 

0 

2 

0 1 

0 1 

!?] 


"l 

0 

0~ 

"l 

0 

0~ 

0 

1 

3 

0 

4 

0 

0 

0 

1 

0 

0 

1 

31.  10-2]  [10-2 
0 4 3 = 0 1 0 

0 0 1 0 0 1 

33.  _1  1 

4 8 

1 1 

4 8 

35.  1 0 2 

0 1 "I 

4 4 

0 0 1 

Add  — 1 times  the  first  row  to  the  second  row.  Add  — ] times  the  first  row  to  the  third  row.  Add  — \ times  the  second  row  to  the  first  row.  Add  the  second  row  to 
the  third  row. 


1 O' 

“7  0 

‘1  -f 

'1  O' 

-1  1_ 

4 

0 1_ 

_0  1_ 

,°  i 

"1 

0 

1 
4 

o' 

'1 

0 

o' 

"1 

0 

2" 

= 

0 

0 

0 

1 

-3 

0 

1 

0 

0 

0 

1 

0 

0 

1 

0 

0 

1 

True/False  1.5 

False 
True 
True 
True 
True 
True 
(g  False 


Exercise  Set  1 .6 
1 *i  = 3,  x2  = -1 
3.  *1  = “ 1 *2  = 4,  x3  = -7 
5.  x = l,  y = 5,  z=  -1 
7.  xi  = 2b  \ — 5&2,  *2=  — &1  + 3&2 


i-  *,  = 22. 
1?, 

17’ 


*1 : 


*2  =17 

11 

17 


*2  z 


9. 


11. 


L *1  = 


11  n = 


15 

34 


> x2  = 


A_ 
15 
28 

,1  = - *2  = - 

iiLxl=ii  *9=n 
X1  15  ’ *2  15 

1 3 

5 


*1=  - *2  = 


13.  No  conditions  on  b\  and  62 
15.  ^3  = ^1  ~h 

7.  &i=i>3  4=Z?4,  &2  = 2&3  + &4 


19. 


^r= 


11  12  -3  27  26 

—6  -8  1 -18  -17 

— 15  -21  9 -38  -35 


True/False  1.6 

a)  True 
True 
True 
d)  True 
True 
f)  True 
True 

Exercise  Set  1 .7 

1. 


-k  0 


0 - 

-1 

0 

0 

6 


1 

5 

0 0 


0 3 
3 

4 -1 
4 10 

-15  10 

2 -10 
18  -6 


A2  = 


11. 


0 20  -20' 

6 0 6 

1 

i 

: ~6. 

"1  O' 

r 

ih> 

L 

II 

, A~k  = 

1 / (—2)* 


o -7T 


'o 

o 

0 9 0 

II 

f 

o 

o 

O'* 

0 4* 


Not  symmetric 
Symmetric 
Not  symmetric 
19.  Not  symmetric 
Not  invertible 
23.  = - 8 

25.  x*  1,  -2,4 


27. 


35. 


1 0 0 
0-1  0 
0 0-1 

a.  Yes 

No  (unless  « = 1) 
c.  Yes 

No  (unless  n = 1) 


39. 


43. 


0 0 —8 
0 0-4 
8 4 0 

1 10" 
0 -2 


A = 


True/False  1.7 

l)  True 
o False 
False 
l)  True 
) True 
False 
;)  False 
i)  True 
True 
False 
0 False 
False 
True 


(i) 

(j) 


Exercise  Set  1.8 

1. 


50 


3.  a.  *3  “ *4=  “ 500,  — x\  4=*4=  100,  — xj  = 300,  X2  — *3  = 100 

b.  = —100  + 2,  X2=  —400  + 2,  X3  = —500  + 2,  *4  = 2 

c.  For  all  rates  to  be  nonnegative,  we  need  t = 500  cars  Per  hour,  so  x\  = 400,  X2  = 100,  x 3 = 0,  *4  = 500 
5.  h = ^-A,  /2  = -|a,  /3=+A 

7-  li=IA  = I5  = l6  = l A,  /2  = /3  = 0A 

9.  x\  = 1,  X2  = 5,  X2  = 3,  and  *4  = 4;  the  balanced  equation  is  C3H3  + 5C>2  — ► 3C02  + 4H2O 
1 x 1 = X2  = *3  = X4  = 2;  the  balanced  equation  is  CH3COF  + H2O  — ► CH3COOH  + HF 
13.  p(x)  = x2-  2* + 2 
15-  pW  = l+n*-I,3 

17-  9 

a.  Using  = k as  a parameter,  p (x)  = 1 + kx  + ( 1 — k)x  where  — 00  < k < 00  • 

The  graphs  for  k = 0,  1,2,  and  3 are  shown. 


4 

k = 0 

3 

k=  1 

2 

- 

1 1 

1 1 r* 

2 1 

1 2 

True/False  1.8 

True 

False 

True 

False 

False 

Exercise  Set  1.9 

a '0.50  0.25 
0.25  Oioj 


b.  I"  $ 25,290 
$ 22,  581 


a.  [0.1  0.6  0.4 

0.3  0.2  0.3 
0.4  0.1  0.2 

b.  [ $ 31,500" 

$ 26,  500 
$ 26,  300 

5.  [ 123.08] 

|_202. 56  J 

True/False  1.9 

False 

True 

False 

True 

True 


Chapter  1 Supplementary  Exercises 

+ *4  = 1 


1.  3^1  - x2 
2x\ 


+ 3x3  + 3x4  = — 1 
■t- 


3 3 1 9 15 

*1=r-?“2’  *3 


~ " 2"  2 

2xi  - 4x2  4=  X3  = 6 
—4xi  + 3x3  = — 1 

*2  ~ x3  = 3 

5-  r'-fz  + f*  /=-±X  + ly 

7.  x =4,  y = 2,  z = 3 
9.  a a * 0,  b * 2 
b.  a * 0,  & = 2 
c>  a = 0,  & = 2 


d. 


a = 0,  b * 2 


11. 


£ 


13. 


r°  2 

h i 

a-  X- 

b.  f 1 -21 

[3  ij 


X-- 


-1  3 -1 
6 0 1 


113  160 

‘ 37  37 

_20  _46_ 

37  37 


15. 


a = l,  6 = - 2,  c = 3 


Exercise  Set  2.1 

1 Mu  =29,  Cn=29 
M\2  = 21,  Ci2=  -21 
M13  = 27,  C13  = 27 
M21=  -11,  C2i  = 11 
M22  = 13,  C22  = 13 
M23  = —5,  C23  = 5 
M3l  = - 19,  C31  = - 19 
M32  = — 19,  C32  =19 
M33=19,  C33  =19 

a.  Mi3  = 0,  Ci3  = 0 

b.  ^23  = —96,  C23  = 96 

c.  M22  = -48,  C22=  -48 


s,  X4 


d.  Mj ,=72,  C2 1=  -72 


5. 

" 2 

5 ' 

22; 

11 

22 

1 

3 

11 

22 

7. 

2 

7 

59; 

59 

59 

7 

5 

59 

59 

9.  a2 -5a + 21 
11.  -65 
13.  -123 
15.  A=  1 or  -3 
17.  A = 1 or  — 1 
9.  (all  parts)  — 123 

21  -40 

23.  0 
25.  -240 
27.  “I 
29.  0 
31.  6 

33  The  determinant  is  sin20  -H  cos20  = 1 • 

35.  d2  = d\  + A 


True/False  2.1 

False 

False 

True 

True 

True 

False 

False 

False 

True 

Exercise  Set  2.2 

5.  -5 
7.  -1 

9.  1 

5 

33 

6 

17.  -2 


Exercises  14:  39;  Exercise  15:  6;  Exercise  16:  — Exercise  17:  —2 

6 


21.  “6 
23.  72 
25.  -6 
7.  18 

True/False  2.2 

True 

True 

False 

False 

True 

True 

Exercise  Set  2.3 

Invertible 

Invertible 


Not  invertible 
13.  Invertible 

15-  ,t.  1±j[E 
**  2 
17.  fc  * - 1 

19. 


3 -5  -5 
=3  4 5 

2 =2  =3 


21. 


23. 


1 2 

2 2 


- 1 


0 0^ 


-4  3 

2 =1 

-7 

6 


0 -1 
0 0 
0-1  8 
0 1 -7 


25.  * = J_  „ = _2_  z=-± 

A ii  - / 


ir 


lr 


n 


27  30  38 

*1  = “ "fp  *2  = - “jp  *3  = 

29.  Cramer's  rule  does  not  apply. 

31.  y = 0 


a.  —189 

b.  _1 

7 

c.  _2 

7 

d.  L 

56 

e.  7 

189 

b.  1 

7 

c.  I 

7 

d.  _L 

56 


40 

11 


True/False  2.3 

(a  False 
False 
True 
(d)  False 
True 
True 
True 
True 
True 
True 
True 
False 

Chapter  2 Supplementary  Exercises 
1.  -18 

3 24 

5.  -10 

7.  329 

Exercise  3:  24;  Exercise  4:  0;  Exercise  5:  -10;  Exercise  6:  —48 
The  matrices  in  Exercise  1-3  are  invertible,  the  matrix  in  Exercise  4 is  not. 
13.  —b2  + 56  — 21 
15.  -120 


17. 


19. 


21. 


23. 


1 1 

‘6  9 

1 2 

6 9 


1 

1 

3 

8 

8 

8 

1 

5 

1 

8 

24 

24 

1 

7 

1 

4 

"12 

12 

1 

2 

1 ' 

5 

5 

10 

1 

3 

2 

5 

5 

5 

2 

6 

3 

5 

5 

10 

10 

2 

52 

27 

329 

329 

329 

329 

55 

11 

43 

16 

329 

329 

329 

329 

3 

10 

25 

6 

47 

47 

47 

47 

31 

72 

102 

15 

'329 

329 

329 

329 

s.  *'-!*+f* 


29. 


(b)  C0S,=  Z+Z^1 


2flC 


, cos  7 = 


a2+&2-c2 

2a& 


Exercise  Set  3.1 

1.  « 


d. 


f 


fM-3.4.5) 

Ff| 


d”; 

(3,  f 4. 5)J 


I I I I I I w 

vftj! 

LI-.17 

(3,4,  -5) 
(-3,  -4. 5)A  * 

4k 

JfcK 

-|  (J-3.4.-5) 


i 
i 
i 


b. 


c. 


d. 


e. 


f. 


b. 


c. 


PiP2  = (- 1,3) 
b-  Pl?2  = (-3,  6,1) 

The  terminal  point  is  B(2,  3). 

The  initial  point  is  A(  — 2,  — 2,  — 1 ) . 

a.  u=  (— 1,  2,  — 4)  is  one  possible  answer. 

b.  u = (7,  — 2,  — 6)  is  one  possible  answer. 

13.  a u+w=  (1,  —4) 

b.  v — 3u=(-12,  8) 

c.  2(u  — 5w)  = (38,  28) 

d.  3v  — 2(u  4-  2w)  = (4,  29) 

e.  -3(w-2u  + v)  = (33,  -12) 

f (— 2u  — v)  — 5(v+  3w)  = (37,  17) 

a.  (-1,9,  -11,1) 

b.  (22,53,  - 19,  14) 

c.  (-13,13,  -36,  -2) 

d (_90,  - 114,60,  -36) 

e.  (-9,  -5,  -5,  -3) 

f.  (27,  29,  - 27,  9) 

a.  w— u=  (—9,  3,  -3,  -8,5) 

b.  2v-h3u=(13,  -5,  14,  13,  -9) 


c.  — w+  3(v  — u)  = (—14,  -2,24,2,7) 

d.  5(— v + 4u— w)  = (125,  -25,  -20,75,  -70) 

e.  — 2(3w  + v)  + (2u  + w)  = (32,  - 10,  1,  27,  - 16) 

f-  i(w-5v  + 2u)+y=(|,  f.  -12,  -§.  -2) 

19.  a v — w=  (—2,  1,  -4,  -2,7) 

b.  6u-H  2v=  ( — 10,  6,  -4,26,28) 

c.  (2u  — 7w)  — (8v  + u)  = (—77,  8,  94,  -25,23) 


Not  parallel 

Parallel 

Parallel 

25.  a = 3,  b = -1 
27.  ci  = 2,  C2  = - 1,  C3  = 5 
9.  Cl  = 1,  £72  = 1,  C3  = — 1,  £74  = 1 

a.  (9_  1 n 

U'  2’  2 J 

b.  /23  _9  n 

U ’ 4’  4) 

True/False  3.1 

(a  False 
False 
(c  False 
True 
True 
(f)  False 
(g  False 
True 
False 
True 
(k)  False 

Exercise  Set  3.2 


3-  a.  ||u  + v||  = /83 

b-  ||»||  + ||v||  = /i7  + /26 

c.  ||— 2u  + 2v||  =2\[i 

d.  ||— 3u  — 5v+w||  = |/466 

a.  ||3u-5v+w||  = v/2570 

b.  ||3u||  — 5||v||  + ||w||  = 3^46  — 10^2?+  \[A2 

«•  II  - IMMI  = 2/%6 

7.  = l k=_l 

K r K i 

a<  a ■ v = — 8,  u • u = 26,  v • v = 24 

b.  u*v  = 0,  u-u  = 54,  v-v=21 

a.  ||u-  v||  = {\4 

b.  ||u  — v 1 1 = ^59 

c.  ||u-  v||  = ^677 


11. 


13. 


a-  cos  9 = 


15 


/27/I7 ; 


0 is  acute 


b cos  9 = — 


{s{A5  ; 


0 is  obtuse 


c-  cos  9 = — 


136 


/225/Tio ; 


0 is  obtuse 


15. 


a • b = 45 


17. 


19. 


a.  u ■ (v  • w)  does  not  make  sense  because  v • w is  a scalar. 

b.  u-  (v  + w)  makes  sense. 

|| u ■ v||  does  not  make  sense  because  the  quantity  inside  the  norm  is  a scalar, 
(u  • v)  — ||u||  makes  sense  since  the  terms  are  both  scalars. 


(4  -I) 


(5/2  ■ 5/2) 

.3  i £ 

4’  2’  4 


(_ J 2 3 4 5_ 

\fi5’  \[55’  /55’  /55’  /55 


23. 


;0  = - 


11 


/%2 


b-  cos  0= =- 

/10 


25. 


cos  0 = 0 
cos  0 = 0 

|u  • v|  = 10.  Hull  ||v||  = /TJ/17  ss  14  866 
|u  • v|  =7,  ||u||||v||  = /io/i4«  11.832 
|u  • v|  = 5,  ||u||  ||v||  = (3) (2)  = 6 


7.  A sphere  of  radius  1 centered  at  (xg,  zq)- 

True/False  3.2 

(a)  True 
True 
(c  False 
True 
True 
(f)  False 
(g  False 
False 
True 
(j)  True 

Exercise  Set  3.3 

a.  Orthogonal 
Not  orthogonal 
Not  orthogonal 
Not  orthogonal 


Not  an  orthogonal  set 
Orthogonal  set 
Orthogonal  set 
Not  an  orthogonal  set 


5. 


± 


Ik  k 


Yes 


9.  — 2(x  + l)  + 0 — 3)  — (z  + 2)  = 0 
11.  2z  = 0 
13.  Not  parallel 
Parallel 

Not  perpendicular 

19-  a.  2 

5 

b.  _1S_ 

fn 

21.  (0.  0)  (6,  2) 

23.  /_!£  0 -M'l  (55  . _m 
\ 13’  ’ 13/  U3  13  J 

(»•  !•  -?)•  (>•  i I) 

27.  1 _i  J_  _ M /£  6 _9_  21_\ 

\5’  5’  10'  10/  \5’  5’  10’  10 } 

29.  1 
31.  _J_ 

33.  I 
3 

35.  1 

/29 

37.  JJ_ 

& 

39.  0 (The  planes  coincide.) 

b)  cos  ft  = -|Ar-  C0S7=irT 

IMI  ||v|| 

True/False  3.3 

(a)  True 
True 
(c)  True 
True 

(e)  True 

(f)  False 

(g)  False 

Exercise  Set  3.4 

1.  Vector  equation:  (x,  y)  = ( _ 4,  1)  + *(0,  -8); 

parametric  equations:  x = — 4,  y = \ — 8* 

3.  Vector  equation:  (x,y,z)  =£(  — 3,0,  1); 

parametric  equations:  x = =3 1,  7 = 0,  z = t 
Point:  (3,  — 6);  parallel  vector:  ( — 5,  —1) 

Point:  (4,  6);  parallel  vector:  (—6,  — 6) 

9.  Vector  equation:  ( x,y,z ) = (-3,  1,  0)  +*i(0,  -3,  6)  + *2(-5,  1,  2); 


parametric  equations:  x = - 3 - 5*2,  7 = 1 - 3t\  + *2,  z = + 2*2 

1.  Vector  equation:  ( x,y,z ) = ( - 1,  1,4)  + *i(6,  - 1,  0)  + *2(-  1,  3,  1); 

parametric  equations:  x = - 1 + 6*i  - *2,  7=1-*!  + 3*2,  z = 4 + *2 
A possible  answer  is  vector  equation:  (*,  y)  =t  (3,  2) ; 

parametric  equations:  x = 3t,  7 = 2* 

A possible  answer  is  vector  equation:  (x,  7,  z)  = £\  (0,  1,0)+  *2(5,  0,  4); 

parametric  equations:  x + 5*2,7=*i,z  = 4*2 
17.  *1=  ~ s ~ t,  *2  — s,  X2  = £ 

9*  x\  = yr-^ps-y*,  X2=  -yr+ys+y*,  *3  = r,  Z4  = s,  *5  = * 


21. 


a.  (l,0,0)+s(-l,  1,0) +*(-1,0,  1) 

a plane  in  £3  passing  through  P(  1,  0,  0)  and  parallel  to  ( — 1,  1,  0)  and  ( — 1,0,  1) 


23*  a.  * + 7 + z = 0 

-2x  + 3^  =0 

a line  through  the  origin  in  £3 

c-  X=  -|«,  y=  z = t 

25.  a 2,1, 

a X\  = - yS  + y*,  X2  = S,  *3  =t 

c-  *1  = 1 — -|s 4=  y*,  ^2  = S,  *3  = 1 + 1 

7-  xi  = j — ys  — y*,  *2  = s,  X2  = t,  *4=1;  The  general  solution  of  the  associated  homogeneous  system  is  x\  - 
particular  solution  of  the  given  system  is  x i = -y,  *2  = 0,  *3  = 0,  7:4  = 1. 

True/False  3.4 

True 
False 
(c)  True 
True 
(e)  False 
True 

Exercise  Set  3.5 

1.  a.  (32,  -6,-4) 

b.  (-14,  -20,  -82) 
c.  (27,40,  -42) 

3.  (18,36,  -18) 

5.  ( - 3,  9,  - 3) 

7.  /59 
9.  fm 
3 

13.  7 

15.  i/374 


4 1 

“I*”  3*'  X2  = s’  *3  = 


16 

The  vectors  do  not  lie  in  the  same  plane. 
21.  -92 

23.  abc 
25.  a -3 
b 3 
c 3 

27 ’ a.  ^26_ 

2 

b \j26_ 

3 

29.  2(vxu) 


37. 


a 17 

6 

b 1 

2 


True/False  3.5 

True 

True 

False 

True 

False 

False 


t,  X4  = 0.  A 


Chapter  3 Supplementary  Exercises 


a.  3v  — 2u=  (13,  -3,  10) 

b.  ||u  + v + w||  = /70 

c.  fm 

d’  projw“=  — 27"(2’  -5,  -5) 


profeu  = - 


e.  u • (vxw)  = — 122 

f.  ( — 5v  + w)  x ((u  • v)w)  = ( — 3150,  —2430,1170) 

3.  a.  3v  — 2u  = ( — 5,  -12,20,  -2) 

b.  ||u  + v + w||  = }j  106 

c.  2810 

d'  ProJwu=  — ^y(9,  1,  -6,  -6) 

Not  an  orthogonal  set 

A line  through  the  origin,  perpendicular  to  the  given  vector. 

A plane  through  the  origin,  perpendicular  to  the  given  vector. 

{0}  (the  origin) 

A line  through  the  origin,  perpendicular  to  the  plane  containing  the  two  noncollinear  vectors. 


17.  Vector  equation:  (x,  y,  z)  = ( = 2,  1,  3)  +/j(l,  -2,  -2  )+t2(5,  “1,  -5); 

parametric  equations:  * = -2  + *i  + 5 t2,  y=\-2t\-t2,  z=3-2t\  -5t2 
19.  Vector  equation:  (*,>>)  = (0,  — 3)  +*(8,  — 1); 

parametric  equations:  x = St,  y = —3  —t 

A possible  answer  is  vector  equation:  ^)  = (0,  — 5)  + *(1,  3);  parametric  equations:  x=t,  y = — 5 -f  3^ 

23.  3(x+l)+6O-5)  + 2(z-6)  = 0 
25.  — 18(x  — 9)  — 5\y  — 24(z  — 4)  = 0 
A plane 

Exercise  Set  4.1 

1.  (a)  u + v = (2,  6),  3u  = (0,  6) 

Axioms  1-5 

3 The  set  is  a vector  space  with  the  given  operations. 

Not  a vector  space,  Axioms  5 and  6 fail. 

Not  a vector  space.  Axiom  8 fails. 

The  set  is  a vector  space  with  the  given  operations. 

The  set  is  a vector  space  with  the  given  operations. 

True/False  4.1 
(a  False 
False 
(c  True 
(d)  False 
(e  False 

Exercise  Set  4.2 

(a),(c),(e) 

3.  (a),  (b),  (d) 

5.  (a),  (c),  (d) 

7.  (a),  (b),  (d) 

9.  (a),  (b),  (c) 


9.  True 


11  ST(-1,  -1.5) 


11. 


a.  The  vectors  span 
The  vectors  do  not  span 
The  vectors  do  not  span 
The  vectors  span 

The  polynomials  do  not  span 

a-  Line;  x = y = -| t,  z = t 

b.  Lm&;  x = 2t,  y=t,  z = 0 
Origin 

Origin 

e.  Line;  x = -3 1,  y = -2 1,  z = t 

f.  Plane;  x-3y+z  = 0 

True/False  4.2 

True 
True 
(c  False 
(d)  False 
(e  False 
True 
True 
False 
False 
True 
(k)  False 

Exercise  Set  4.3 

a.  U2  is  a scalar  multiple  of  uj. 

The  vectors  are  linearly  dependent  by  Theorem  4.3.3. 

P2  is  a scalar  multiple  of  Pi. 

B is  a scalar  multiple  of  A. 

I.  None 

They  do  not  lie  in  a plane. 

They  do  lie  in  a plane. 

(h)  vi  = |v2  - |v3,  v2  = |v i + |v3,  v3  = - jvi  + |v2 
9' A=“2’ A=1 

They  are  linearly  independent  since  vj , V2,  and  V3  do  not  lie  in  the  same  plane  when  they  are  placed  with  their  initial  points  at  the  origin. 
They  are  not  linearly  independent  since  v\ , V2,  and  V3  line  in  the  same  plane  when  they  are  placed  with  their  initial  points  at  the  origin. 
21  W(x)  = — x sin  x — cos  x * 0 for  some  x. 

23-  a.  W{x)  = ex*0 

b.  W(x)  = 2 * 0 

25.  W(x)  = 2 sin  x * 0 for  some  x. 

True/False  4.3 

(a  False 
True 
(c  False 
True 
True 

(f)  False 

(g)  True 
(h  False 

Exercise  Set  4.4 


1. 


A basis  for  g}  has  two  linearly  independent  vectors. 

A basis  for  g}  has  three  linearly  independent  vectors. 
A basis  for  Pj  has  three  linearly  independent  vectors. 
A basis  for  M 22  has  four  linearly  independent  vectors. 


a.  (v)<?=(3,  -2,1) 

b.  (v)s=(-2,0,l) 

1 (A)s=(- 1,1,  -1,3) 

13.  A = A\  - A2  + A2  - A4 
5.  P = 7pi-8p2d-3p3 

a.  (2,  0) 


True/False  4.4 

(a  False 
False 
True 
True 
(e  False 

Exercise  Set  4.5 

Basis:  (1,0,  1);  dimension  = 1 

3.  Basis:  (4,  1,  0,  0),  (— 3,  0,  1,  0),  (1,  0,  0,  1);  dimension  = 3 
No  basis;  dimension  = 0 


(1,  1,0),  (0,  0,  1) 

c.  (2,  -1,4) 

d.  (1,  1,0),  (0,  1,  1) 

9-  a.  " 

b.  «(»  + !) 

2 

c.  n{n  4- 1) 

2 

Any  two  of  (0,  1,  0,  0),  (0,  0,  1,  0),  and  (0,  0,  0,  1)  can  be  used. 
v3  = O,  c)  with  9a  - 3b  - 5c  * 0 

True/False  4.5 

True 
True 
(c  False 
True 
True 
True 
True 
True 
True 
(J)  False 

Exercise  Set  4.6 


(a),  (b) 

a.  (w)^=(3.  -7) 


(0,  1) 


1. 


[w]*= 


b. 


[w]  S = 


[w]tf  = 


3 

-7 

_5_' 

28 

_3_ 

14 

a 

b-a 

2 


(P )s=(4.  -3,1),  [p]*= 


b. 


(p)^=(0,2,  -1),  [p]*= 


4 

-3 

1 

O' 

2 

-1 


a.  w=  (16,  10,  12) 

b.  q = 3 + 4x2 


B = 


15  -1 
6 3 


b. 


11  _i 

10  2 

-t  • 

o -f 


b. 


[w  ]B  = 

3 2 

-2  =3 

5 1 

[w]B  = 


1Z 

10 

8 

5 

5 
2 

2 

6 


[w  ]B'  = 


9 

-9 

5 


> [w ]B'  = 


_7 

2 

23 

2 

6 


11. 


13. 


(b) 


(c) 


(d) 


(a) 


(b) 


2 0 
1 3 


i 0 

1 1 

“6  3 


Wb 


-[4 


Wb'  = 


l 

-2 


(d) 


1 2 3 

2 5 3 

1 0 8 
-40  16  9 

13  -5  -3 
5 -2  -1 
-239' 


[w  ]B  = 


(e) 


[w]  S = 


11 

30 


Ms= 


3' 

-5 

0 


[w  ]B  = 


5 

-3 

1 

-200 

64 

25 


15. 


(a) 


(b) 


3 5 

=1  -2 
2 5 

-1  -3 


(d)  [w]Bl  = 

2" 

-1_ 

, [w]  b2  = 

~-f 

1_ 

E)  WBj  = 

3" 

-1 

, W By  = 

4" 

-1 

17. 


* i 


=2  -3  - 
5 1 


b. 


[w  ]b1  = 


[w]  b2  = 


_1_ 

2 

23 

2 

6 


19. 


23. 


| cos  29  sin  29 
sin  29  —cos  29 


a.  B=  {(1,1,0),  (1,0,  2),  (0,2,1)) 

bH(M- 


2 2 
5'  5 


•«} 


True/False  4.6 

True 

True 

True 

True 

(e)  False 

(f)  False 

Exercise  Set  4.7 

1.  ri  = (2,  -1,0,1),  r2  =(3,5,7,  - 1),  13  = (1,4,  2.7); 


2 

-1 

0 

1 

ci  = 

3 

- c2  — 

5 

» c3  = 

7 

, c4  = 

-1 

1 

4 

2 

7 

1 
4 

b is  not  in  the  column  space  of  A. 


1 

-1 

1 

5 

9 

-3 

3 

4= 

1 

= 

1 

1 

1 

1 

-1 

d. 


r 

~-r 

1 

+«-d 

1 

-1 

-1 

= -26 


+ t 


4=13 


b. 


1 

0 

-2 

7 

0 

-1 

0 

0 

0 


4 -t 


■ t 


+ t 


4 -r 


1 

3 

1 

-f 

; t 

-1 

. 

1 

-7 


4=4 


-1 

-2 

2 

-1 

-2 

0 

0 

1 

0 

0 

4 ‘S 

1 

+ t 

0 

; r 

0 

4=s 

1 

4=  t 

0 

0 

1 

0 

0 

1 

d. 


6 

7 

1 

1 

1 

5 

5 

5 

5 

5 

7 

4 

3 

4 

3 

5 

5 

5 

; s 

5 

5 

0 

1 

0 

1 

0 

0 

0 

1 

0 

1 

a. 

T 

~2" 

ri  = [1  0 2],  r2  = [0  0 1],  ci  = 

0 

> c2  = 

1 

0 

0 

b. 

T 

~-3~ 

ri  = [l  — 300],  r2  = [0  1 0 0],  ci  = 

0 

0 

> c2  = 

1 

0 

0 

0 

c.  r1  = [1  2 4 5],  r2=  [0  1 -3  0],  r3=  [0  0 1 - 3],  r4=  [0  0 0 1] , 


1 

2 

4 

5 

0 

1 

-3 

0 

0 

> C2  = 

0 

- C3  = 

1 

, c4  = 

-3 

0 

0 

0 

1 

0 

0 

0 

0 

d.  n = [1  2 -1  5],  r2=  [0  14  3],  r3  = [0  0 1 -7],  r4=  [0  0 0 1] 


1 

2 

-1 

5 

0 

1 

4 

3 

ci  = 

0 

- c2  = 

0 

- C3  = 

1 

- c4  = 

-7 

0 

0 

0 

1 

a. 

T 

' 2 " 

ri  = [ 1 0 2];  r2  = [0  0 1 ] ; ci  = 

0 

; c2  — 

1 

0 

0 

b. 

r 

ri  = [l  —3  0 0];r2=[0  1 0 0 ] ; c i = 

0 

0 

; c2  = 

0 

c.  ri  = [ 1 2 4 5 ] ; r2  = [ 0 1 -3  0 ] ; r3  = 

:o 

0 1 - 

-3 

1 

0 

0 

3]; 


r4  = [0  0 0 1 ];  ci  = 


r4  = [0  0 0 1 ];  ci  = 


l 

2 

4 

5 

0 

1 

-3 

0 

0 

; c2  = 

0 

; c3  = 

1 

; c4  = 

-3 

0 

0 

0 

1 

0 

0 

0 

0 

= [0  1 4 

3];  r3  = 

[0  0 

1 -7]; 

1 

2 

-1 

5 

0 

1 

4 

3 

0 

; c2  = 

0 

; c3  = 

1 

; c4  = 

-7 

0 

0 

0 

1 

11. 


15. 


17. 


a.  (1,1,  -4-3),  (0,1,  -5,  -2),  (o,  0,  1, 

b*  (1,  -1,2,0),  (0,  1,0,0),  |0,0,  1,  -i) 

c (1,  1,  0,  0),  (0,  1,  1,  1),  (0,  0,  1,  1),  (0,  0,  0,  1) 

(b) 


0 0 0 
0 1 0 
0 0 1 


- 

[3*  - 


-5a 

5b 


for  all  real  numbers  a,  b not  both  0. 


Since  A and  B are  invertible,  their  null  spaces  are  the  origin.  The  null  space  of  C is  the  line  3*  = 0-  The  null  space  of  D is  the  entire  xj^-plane. 


True/False  4.7 

True 

False 

False 


(d)  False 
False 
True 
True 
False 
True 
False 

Exercise  Set  4.8 

1 Rank(,4)=Rank(,4r)=2 

a.  2;  1 

b.  1;  2 

c.  2;  2 

d.  2;  3 

e.  3;  2 

5.  a Rank  = 4,  nullity  = 0 

b.  Rank  = 3,  nullity  = 2 

c>  Rank  = 3,  nullity  = 0 

a.  Yes,  0 

b.  No 

c.  Yes,  2 

d.  Yes,  7 
No 
Yes,  4 
Yes,  0 

9.  & l = r,  = s,  &3  = 4s  — 3r,  64  = 2r  — s,  b$  = %s  — Ir 

11.  No 

13  Rank  is  2 if  r = 2 and  s = 1 ; the  rank  is  never  1 . 


True/False  4.8 


False 
True 
(c  False 
(d)  False 
(e  True 
(f)  False 
(g  False 
(h)  False 
True 
(j)  False 

Exercise  Set  4.9 

Domain:  codomain: 

Domain:  codomain:  g} 

Domain:  codomain: 

Domain:  g6;  codomain: 

3.  R2,  R 3,  (-1,2,3) 

a.  Linear;  p?  _» 

Nonlinear;  g? 


c.  Linear;  g?  _ ► 

Nonlinear;  £4  _ ¥ g2 

(a)  and  (c)  are  matrix  transformations;  (b),  (d),  and  (e)  are  not  matrix  transformations. 


9. 


11. 


3 5 -1 

4 -1  1 

3 2-1 


; 7(  — 1,  2,  4)  = (3,  -2,-3) 


25. 


13. 


15. 


17. 


19. 


0 1 
-1  0 
1 3 

1 -1 

7 2—11 
0 1 10 
-10  0 0 

0 0 0 
0 0 0 
0 0 0 
0 0 0 
0 0 0 

d.  0 0 0 1 

10  0 0 

0 0 10 

0 1 0 0 

10-10 

a.  7(-  1,4)  = (5,4) 

b.  7(2,  1,  — 3)  = (0,  -2,0) 

a.  (2,  -5,-3) 

b (2,5,3) 

c.  (-2,  -5,3) 

a.  (“2,1,0) 

b.  (-2,0,3) 
c (0,  1,  3) 


’ b 


1/3 — 2 1+2/3 


b.  (0,  1.2/2) 

c.  (-1.  -2.2) 


21. 


_2  ^+2  -1+2/3 


■2, 

£. 

b.  (-2/2,  1,0) 

c (1,2,2) 


29. 


Twice  the  orthogonal  projection  on  the  x-axis. 
Twice  the  reflection  about  the  x-axis. 


Rotation  through  the  angle  20- 

3:  Rotation  through  the  angle  0 and  translation  by  xq;  not  a matrix  transformation  since  xq  is  nonzero. 

35.  Aline  in£”. 

True/False  4.9 

(a)  False 
False 
(c)  False 
True 


False 
0 True 
False 
False 
i)  True 

Exercise  Set  4.10 


1. 

' 5 -1  21" 

'-8  -3  r 

II 

0 

10  “8  4 

, TAoTb  = 

00 

1 

m 

7 

1 

45  3 25 

44  -11  45 

b-  T2  O Ti 


Tlo72  = P 


4 

-4 


c.  T2(T\(xi,  x2))  = {3x\^3x2,6x\-2x2). 


T\(T2{x\7  x2))  = {5x\+4x2,  x\-4x2) 


b. 


-1  0 0~ 

0 0 0 
0 0 1 

1 0 f 

o {2  0 
-1  0 1_ 

-1  0 0] 

0 1 0 
0 0 0 

a.  T\  oT2  = T2oT\ 

b.  T\oT2  = T2oT\ 

c.  T\oT2±  T2  oTi 


1 O' 

0 -1 

0 O' 

0 i 

3 0 
0 -3 


11. 


13. 


Not  one-to-one 

b.  One-to-one 

c.  One-to-one 

d.  One-to-one 

e.  One-to-one 
One-to-one 
One-to-one 


One-to-one; 


Not  one-to-one 
One-to-one; 
Not  one-to-one 


T '("l.  W2)  = (jwi-|w2.  + 

F?  ■;]•  t~‘ 


(wi,  w2)  = (-w2,  -Wl) 


Reflection  about  the  x-axis 
Rotation  through  the  angle 

Contraction  by  a factor  of  -j 

Reflection  about  the  yz-plane 
Dilation  by  a factor  of  5 


17. 


19. 


21. 


23. 


25. 


27. 


29. 


Matrix  operator 
Not  a matrix  operator 
Matrix  operator 
Not  a matrix  operator 

Matrix  transformation 
Matrix  transformation 

-1  0 
0 0 
b.  0 1 
-1  0 
0 0 
3 0 

a.  ^(ei)  = (-  1,2,4),  TA(e2)  = ( 3,1,5 

b.  Ta(<>  1 + + e3)  = (2,  5,  6) 

c.  ^(7e3)  = (0,  14,  -21) 

a.  Yes 
Yes 


(b) 


T{x l,  *2)  = (*?  + *2>  'w) 


The  range  of  T is  a proper  subset  of 
T must  map  infinitely  many  vectors  to  0. 

True/False  4.10 

(a  False 
True 
True 
(d  False 
(e^  False 
(f)  False 

Exercise  Set  4.11 

1.  , 


b. 


d. 


0 -1 
-1  0 

-1  0 

0 -1 
1 0 

0 °, 

0 0 
0 1 


a. 

1 

0 

0 

0 

1 

0 

0 

0 

-1 

b. 

1 

0 

0 

0 

•1 

0 

0 

0 

1 

c. 

■1 

0 

0 

0 

1 

0 

0 

0 

1 

a. 

'0 

■1 

0 

1 

0 

0 

0 

0 

1 

b. 

1 

0 

0 

0 

0 

-1 

0 

1 

0 

c. 

0 

0 

1 

0 

1 

0 

■1 

0 

0 

I,  r^(e3)  = ( 0,2,  -3) 


Rectangle  with  vertices  at  (0,  0),  (—3,  0),  (0,  1),  (—3,  1) 


9. 


a.  1 0 

4 1_ 

b.  [1  -2 

0 1 


Expansion  by  a factor  of  3 in  the  x-direction 

Expansion  by  a factor  of  5 in  the  y-direction  and  reflection  about  the  x-axis 
Shearing  by  a factor  of  4 in  the  x-direction 


13. 


b. 


0 5. 

1 °1 
2 5. 

0 -1 
-1  0 


d.  y = — 2x 


No 


23. 


'1  0 k 
0 1 k 
0 0 1 


Shear  in  the  xz-direction  with 


factor  k maps  (x,  y,  z ) to  (x  + ky,  y,z  + ky)  ■ 


1 k 
0 1 
0 k 


0 

0 

1 


Shear  in  the  yz-direction  with  factor  k maps  (x,  y,  z)  to  y 4,  kx,  z + kx)  '■ 


10  0 
k 1 0 
k 0 1 


True/False  4.11 

False 

True 

True 

True 

False 

False 

True 


Exercise  Set  4.12 

Stochastic 
Not  stochastic 
Stochastic 
Not  stochastic 

3.  [0.54545] 
0.45455J 

Regular 
Not  regular 
Regular 

_8_ 

17 
_9_ 

17 


7. 


11. 


13. 


_4_ 

11 

_4_ 

11 

_3_ 

11 


Probability  that  something  in  state  1 stays  in  state  1 
Probability  that  something  in  state  2 moves  to  state  1 
0.8 
0.85 

[0.95  0.55 
[o.05  0.45 
0.93 
0.142 
0.63 


15. 


Year 

1 

2 

3 

4 

5 

City 

95,750 

91,840 

88,243 

84,933 

81,889 

Suburbs 

29,250 

33,160 

36,757 

40,067 

43,111 

City 

46,875 

Suburbs 

78,125 

17. 


23 

100 

46 


159 

22 

53 

47 


159 

35,  50,  35 


19. 


7 1 1 

l 

10  10  5 

3 

1 3 1 

1 

5 10  2 

- q = 

3 

1 3 3 

1 

10  5 10 

3 

P = 


P'^q  = q for  every  positive  integer  k 

True/False  4.12 

True 

True 

(c)  True 

(d)  False 

(e)  True 

Chapter  4 Supplementary  Exercises 

1.  (a)  u + v=(4,3,2),  -u=  (-3,0,0) 

Axioms  1-5 

If  s * 1,  — 2,  the  solution  space  is  the  origin.  If  s = 1,  the  solution  space  is  a plane  through  the  origin.  If  s = — 2>  the  solution  space  is  a line  through  the  origin. 
A must  be  invertible 
9.  a Rank  = 2,  nullity  = 1 
b Rank  = 2,  nullity  = 2 
c Rank  = 2,  nullity  = n — 2 


1.  x2,  xA,  . 

...  x: 

x.  x2.  x3.  . 

...  X' 

| where  2m  = n if n is  even  and  2m  =n  — 1 if  n is  odd. 

-I 


11. 


13. 


a. 


1 0 0 
0 0 0 
0 0 0 


'o 

o 

o 

o 

o 

o 

o 

o 

o 

o 

1 0 0 

, 

0 0 0 

0 1 0 

, 

0 0 1 

o ' 

o 

o 

o ' 
o 

o 

o 

o 

o 

o 

0 0 0 
0 0 0 
0 0 1 


0 1 o' 

o 

o 

o 

o 

o 

1 

10  0 

, 

0 0 0 

, 

0 0 1 

l 

o ' 

o 

o 

o 

o 

7 

o 

o 

J 

Possible  ranks  are  2,  1,  and  0. 

Exercise  Set  5.1 

1.  5 

a.  A2  — 2A  — 3 = 0 
b A2  — 8A+  16  = 0 

c.  A2 -12  = 0 

d.  A2  + 3 = 0 

e.  A2  = 0 

f.  A2  — 2A  + 1 = 0 


Basis  for  eigenspace  corresponding  to  A = 3 : 


11. 


13. 


15. 


b. 


Basis  for  eigenspace  corresponding  to  A = 4 : 


Basis  for  eigenspace  corresponding  to  A = /l2: 


There  are  no  eigenspaces. 

Basis  for  eigenspace  corresponding  to  A = 0 : 

f 

Basis  for  eigenspace  corresponding  to  A = 1 : 

a.  1,2,3 

b.  —\[2,  0,  \[2 

c.  -8 

d 2 
e.  2 

1 “4,  3 


a.  A4  + A3  — 3A2  — A + 2 = 0 


b.  A4  — 8A3  + 19A2  — 24A  + 48  = 0 


A=  1:  basis 


b. 


A = 4:basis 


~o" 

0 

’ 

0 

1 

A = — 2:basis 


-1 

0 

1 

0 


; basis  for  eigenspace  corresponding  to  A = — 1 : 


tf\2 

1 


; basis  for  eigenspace  corresponding  to  A = — 


3 

{n 

1 


T 

'o' 

_0_ 

9 

_1_ 

T 

o' 

_0_ 

7 

_1_ 

, A = — l:basis 


-2 

1 

1 

0 


2y  = 512 


(if—L 

UJ  512 
a y = x and  y = 2x 
No  lines 
c y = 0 


True/False  5.1 

False 

False 

True 


(d)  False 
True 

(f)  False 

(g)  False 

Exercise  Set  5.2 

Possible  reason:  Determinants  are  different. 
Possible  reason:  Ranks  are  different. 

5.  A = 0:1  or  2;  A=  1: 1;  A = 2:l,2,  or  3 

Not  diagonalizable 
Not  diagonalizable 
Not  diagonalizable 


13. 

[1  o] 

1 r 1 

0" 

P = 

3 

P' 

_ 1 1_ 

n 

L° 

— 1 

15. 

_-2  0 

r 

■3 

0 l 

P = 

0 1 

0 

1 

% 

0 

3 l 

1 C 

) 

0 

0 

0 : 

17. 

~1  2 

f 

"1 

0 

o' 

P = 

1 3 

3 

j 

P-'AP  = 

0 

1 2 

0 

1 3 

4 

0 

1 0 

3 

19. 

0 

0 

O 

O 

O 

P = 

1 

1 

00  0 
0 — *■ 

. 1-1  0 

; P~lAP  = 

0 0 0 
0 0 1 

21. 


P = 


"1 

0 

0 

o' 

-2 

0 

0 

0" 

0 

1 

1 

-1 

, P~lAP  = 

0 

-2 

0 

0 

0 

0 

1 

0 

0 

0 

3 

0 

0 

0 

0 

1 

0 

0 

0 

3 

23. 


-1  10237  -2047 
0 1 0 
0 10245  -2048 


25. 


A”  = PD”P~l  = 


"1 

1 

f 

'r 

0 

0 " 

2 

0 -1 

0 

3” 

0 

1 

-1 

1 

0 

0 

4” 

1 

3 

0 - 

1 

'3 


27, 


On  possibility  is  P = 


where  Ai  and  A2  are  as  in  Exercise  20  of  Section  5.1. 


-b  -b 
a — Ai  a — A2 

a.  A = 1 : dimension  =1;  A = 3 : dimension  <2;  A = 4 : dimension  <3 
Dimensions  will  be  exactly  1,  2,  and  3. 
c.  A = 4 

True/False  5.2 

(a)  True 
True 

(c)  True 

(d)  False 

(e)  True 
True 

(g)  True 
True 

Exercise  Set  5.3 

1.  u=(2  + i,  -Ai,  1-0,  Re  (u)  = (2,  0,  1),  Im(u)  = (-1,4,  1),  ||u||  = ^23 
5.  x = (7  — 6z,  — 4 — 8z,  6 — 12j) 


7. 


5 i 4 

2 + i 1 — 5i 


Re  (A) : 


0 4 
2 1 


Im(^): 


-5 

-1 


det(.d)  = 17  — i,  tr(^4)  = 1 


l u • v = — 1+i,  u ■ w=  18  — 7z,  v ■ w=  12 + 6z 
13.  — 11  — 


2 + i 

1 

1 +i 
1 

19.  |A|  = \[2,  0=f 


21.  |A|  = 

2,  0=  - 

w 

3 

W 

II 

T—  O ' ' 
1 

CM  CM 

1 

1 L 

, c= 

'3  -2" 
_2  3_ 

25.  p = 
27.  a. 
b. 

1 -1 
-l  o_ 

lr-  8- 

k~  “ T 

None 

, c= 

■ 

5 -3~ 

.3 

Ai  = 2 — xi  = 

2 — i 

1 

; A2  = 2 + xi  = 

Ai  =4  — i,  xi  = 

"1-T 

1 

; A2  = 4 + j,  xi  = 

True/False  5.3 

(a  False 
True 
(c  False 
True 
(e  False 
(f)  False 

Exercise  Set  5.4 

a*  y\=c\e^x  — 2c2&  x 
y2  =c\e5x  + C2e~x 
b.  ^1  = 0 
^2  = 0 

a-  >>1  = —C2&2x  + c3e3x 
y2  — c\ex  + 2cys2%  — c3e3x 

yz  — 2c2&2x  — c3&3x 

b.  yi=e2x-2e3x 
y2  = ex  — 2e2x  + 2e3x 
y3=  -2e2x  + 2e3x 

7.  y = c\e3x  =hC2&~2x 
9.  y — c\ex  4=  C2<32x  + c3s3x 

True/False  5.4 

(a  False 
False 
(c  True 
True 
(e  False 

Chapter  5 Supplementary  Exercises 

The  transformation  rotates  vectors  through  the  angle  0;  therefore,  if  0 < 9 < then  no  nonzero  vector  is  transformed  into  a vector  in  the  same  or  opposite 
direction. 


3. 


(c) 


1 1 0 
0 2 1 
0 0 3 


15  30~ 

3 = p5  1501 

. A4= 

'375  750' 

. A5  = 

'1875  3750' 

. 5 10_ 

L25  50 

125  250_ 

_ 625  1250 

11  0,  tr(4) 


They  are  all  0. 

is.  r i o o 


They  are  all  0,  1 , or  _ 1 . 

Exercise  Set  6.1 

a.  5 

b. 

c.  -3 

d.  /l3 

e.  {5 

f.  /89 

a.  2 

b.  11 

c.  -13 

d.  -8 

e.  0 

5-  a.  -5 

b.  1 

c.  -7 

d.  1 

e.  1 

f.  1 

a.  3 

b.  56 

29 

{3  0 
0 {5 

b.  [2  0 ■ 

0 {1 

a.  ^74 

0 


a.  /l05 

b.  ^47 

7.  (p,  q}  = 50,  II p II  = 6^3 

a.  3 f2 

b.  3/5 

c.  3/l3 


b. 


A? 


X 


For^=|^  ^ J , then  [V,  V)  = — 2 < 0,  so  Axiom  4 fails. 


29. 


l _28 

15 

b.  0 

True/False  6.1 

True 
False 
True 
d)  True 
False 
f)  True 
False 

Exercise  Set  6.2 


b 3_ 

/73 


c.  0 

d 20_ 

9/To 

e.  L 

f2 

f.  _2_ 

3-  a.  19 

10/7 

b.  0 

7.  No 

9.  a.  *=-3 

b.  * = -2,  — 3 

13.  No 

a.  x = t,  y=  -2t,  z=  -3t 

b.  2x  — 5y  4-  4z  = 0 

c.  * — z = 0 

a.  The  line  y — — x 

b.  The  xz-plane 
The  x-axis 

True/False  6.2 

(a)  False 
True 
True 
True 
False 
(f)  False 

Exercise  Set  6.3 

1.  (a),  (b),  (d) 

3.  (b),  (d) 

5.  (a) 


[kkh\  (' kk°\  ikk'fc) 


9. 


11. 


13. 


a-  - jVi  + JV2  4*  2v3 
— yvi  -|v2  + 4v3 

7 1 5 

C.  -|vj  - yV2  + yV3 

h)  u=  - ^V!  - j^v2  + 0v3  + lv4 

a.  W=MU[  _|U2_IU3 


b * w = —=117  + 


11 


/6 


u3 


15. 

a. 

(2  5 _ 

2 _21 

U’  4’ 

4’  4j 

b. 

/ 17  7 

1 23  ^ 

U2’  4'  ' 

12’  12  J 

17. 

a. 

/ 23  11 

1 17  3 

\1S'  6 ’ 

18’  18 J 

b. 

/3  3 

i n 

U’  2’ 

2'  2 j 

19. 

a. 

— & 

i.  -i.  -i)..,- 

b. 

n 

5 3 5) 

W1  = \4  ’ 

4’“4’-4>W2  = 

21. 

a. 

f 1 

3 ^ / 

VH^'W  V2“^ 

* 


b.  VI  = (1,0),  v2  = (0,  -1) 


23. 


vi 


= fo,  A.  X o\  v2  = Lx.  - 1 2 o| 

^ /?  /?  J [\[30  /30  /30  J 


v3  = 


/To’  /To’ 


2 

/To’ 


2 

/To 


v4  = 


_j L_ 

/T?’  /T?’ 


25. 


_2 3_ 

/T?’  /T? 


27  W1  = fl3  31  40.1  M _ J_  _2J 
1 U4'  14’  14/  2 U4’  14  ’ 14  J 


29. 


1 2_ 

f5  ~f5 

2 1 

f5  f5 


f5  f5 
0 { 5 


J_ i_ 

f2 

0 ~T 

l/3 

1 1 

f2  fl 


1 

3 

_2 

3 

2 

3 


d. 


8 


{234 

11 

/234 

7 

/234 


{2  3/2 
0 /3 


1 

3 

l/26 

3 


J_ 1_  1 

{2  {i  {l 

0 JL  _2_ 

/6 


J_  J_ 

f2  f3 


{l  {l  {2 

. fi- L 

0 0-2= 

1/6 


j_  _/L 


I 

?2 

1 


2/19 

_JL 

2/1? 

3|/2 
/ 19 

Columns  not  linearly  independent 
3.  vi  = 1,  V2  = /3(2x  — 1),  V3  = ^5(6t:2  — 6t:  + 1) 


3_ 

/l9 

1 

3 

3^ 

/l9 

^ /l9 

1 

0 0 -p= 

/l9 

/l9 

True/False  6.3 

False 
False 
True 
d)  True 
False 
f)  True 


Exercise  Set  6.4 


a. 

'21 

25' 

"*l“ 

"20" 

25 

35 

x2_ 

20 

b. 

' 15 

-1 

5' 

■*r 

-l" 

-1 

22 

30 

x2 

= 

9 

5 

30 

45 

x3 

13 

a-  xi=5,  x2  = ^ 

b.  *i  = 12,  X2  = -3,  7:3  = 9 


5. 


a. 


3 

2 

9 

2 

=3 


e = 


3 

-3 

0 

3 


Solution:  x = j;  least  squares  error: 

Solution:  x = (y,  0 j 4= 1{— 3,  1)  (t  a real  number);  least  squares  error:  y ^42^ 


Solution: 


: x = 0 j =M(  — 1,  — 1,  1)  (t  a real  number);  least  squares  error:  -^-j/294 


(7,  2,  9,  5) 

ii  _4  12  J6 
5 ’ 5’  5 ’ 5 


) 


11. 


13. 


T 

det  (.4  A)  = 0;  A does  not  have  linearly  independent  column  vectors, 
det  (A^A)  = 0;  A does  not  have  linearly  independent  column  vectors. 


[P]  = 


[?]  = 


1 0 0 
0 0 0 
0 0 1 
0 0 0 
0 1 0 
0 0 1 


15. 


a.  (1,0, 


b. 


5),  (0,1,3) 

10  15  -5 
15  26  3 

-5  3 34 

c.  I 2xq  I 3yQ  -zq  15x0  1 26^o  I 3 zq  -5x0  I 3^q  I 34z0 


|f|-B 


/ 2xq  + 3^q  -zq  \5xq  + 26,yo  + 3zq  -5xq  4-  3^0  + 34zq  \ 

^ 7 35  35  } 


d.  3^35 

7 

17.  s = l = l 

[/>]  =AT{AAT)~lA 

True/False  6.4 

True 
False 
True 
d)  True 
False 
True 
False 
h)  True 

Exercise  Set  6.5 

3.  y = 2 + 5x  - 3x2 

l.y  = J_+48 

' 21  ^ lx 


I I I I I I I 


10 


True/False  6.5 

(a)  False 
True 
(c)  False 
True 

Exercise  Set  6.6 

a.  (1  4=  it)  — 2 sin  x — sin  2x 

sinx  4= 


) - 2^si 


b.  ^>1  - . sin2x  . sin3x  , , sin 


- + - 


\nx  1 

» J 


a A + —, 

2 e — 1 

b -0-  1 +■  g 

12  2(1  -e] 


3. 


5. 


’■KlM1-'-0*]-*’ 

True/False  6.6 

False 
True 
(c  True 
d)  False 
(e  True 


Chapter  6 Supplementary  Exercises 
a (0,  a , a,  0)  with  a ± Q 

b-  ■ _2_  _L  01 

f5'  F 


± 0, 


The  subspace  of  all  matrices  in  Mji  with  only  zeros  on  the  diagonal. 
The  subspace  of  all  skew- symmetric  matrices  in  M^. 


K No 

0 approaches 

17.  No 


Exercise  Set  7.1 


1. 


3. 


(a) 


(b) 


1 0 
0 1 
1 

f2 

1 

~f2 


(d) 


f2 

ft 

f3 


(e) 


1 1 

2 2 

1 

2 6 

1 1 

2 6 

1 1 

2 6 


9 

12 

25 

25 

4 

3 

5 

5 

12 

16 

25 

25 

f2 

f2 


0 

_2_ 

ft 

1 

2 

1 

6 

1 

6 

5 

6 


f2 

ft 

fl 


1 

2 

1 

6 

5 

6 

1 

6 


a.  (—  l + 3/3,3+/3) 

- (f-^5. 


9. 


11. 


A = 


A = 


if*-  • 

3 

2 

cos  9 0 

—sin  9 

0 1 

0 

sin0  0 

cos  9 

1 0 

0 

0 cos  9 

sin  9 

0 —sin  6 

cos  9 

13-  32  + £2  = ± 

The  only  possibilities  are  fl  = ^ = ^ ’ c ~ 0r  a = ^ = ^ ’ c = 


Rotations  about  the  origin,  reflections  about  any  line  through  the  origin,  and  any  combination  of  these 

Rotation  about  the  origin,  dilations,  contractions,  reflections  about  lines  through  the  origin,  and  combinations  of  these 

No;  dilations  and  contractions 


True/False  7.1 

False 

False 

False 

False 

True 

f)  True 

g)  True 

h)  True 

Exercise  Set  7.2 


a.  A2  — 5A  = 0:  A = 0:  one-dimensional;  A = 5:  one-dimensional 

b.  A3  — 27A  — 54  = 0:  A = 6:  one-dimensional;  A = — 3:  two-dimensional 

c.  A3  — 3A2  = 0:  A = 3:  one-dimensional;  A = 0:  two-dimensional 

d.  A3  — 12A2  H=  36A  — 32  = 0;  A = 2:  two-dimensional;  A = 8:  one-dimensional 
A4  — 8A3  = 0:  A = 0:  three-dimensional;  A = 8:  one-dimensional 

f.  A4  — 8A3  + 22A2  — 24A  + 9 = 0;  A = 1 : two-dimensional;  A = 3;  two-dimensional 


3. 


P = 


__2_  £L 

~fi  f 

fL  _2_  ’ 
f f 


P~lAP  = 


3 0 
0 10 


5. 


3 
5 
0 

4 

5 


P~lAP  = 


25 

0 

0 


0 0 
-3  0 

0 “50 


7. 


P = 


f 

f 

f 


f 

f 

0 


P~lAP  = 


0 0 0 
0 3 0 
0 0 3 


P = 


0 0 


i ^ oo 


0 0- 


oo  i i 


; P~XAP~- 


-25  0 
0 25 


0 0 -25  0 

0 0 0 25 


5.  No 
9.  Yes 

True/False  7.2 


True 
True 
False 
True 
e)  True 
False 
True 


Exercise  Set  7.3 

1. 


[*1  *2 ] 


b. 


[*1  *2] 


3 oir*i 

0 7 *2 


!][: 
[J  u] 


[*1  *2*3] 


9 
3 

-4  1 


n] 

3 -4 

- 1 

4 


3 2x2  + 5y2-6xy 


1 1 

{l  {l 

1 1 

1 2 
■3  3 
2 1 
3 3 

1 2 

3 3 


[£}  Q=M+y\ 


e=^i  +472+^3 


[*^] 


b. 


[*?] 


i 0 


0 0 
0 


!] 


;]  + [7_8]p]-5  = 0 


11. 


ellipse 
hyperbola 
c parabola 
circle 


3.  Hyperbola:  2(yt)2  - 3(x/)2  = 8,  0ss  - 26  . 6° 
Hyperbola:  4(i/)2-0/)2  = 3;  0 = 36.9° 


Positive  definite 
Negative  definite 
Indefinite 

Positive  semidefinite 
Negative  semidefinite 


Positive  definite 
Positive  semidefinite 
Indefinite 
7.  k>2 


n »(»- 1)  »(»-l) 

1 1 ...  1 

i4=  »(»-l)  » »(»-l) 

: : ! 

1 1 ...  1 
»(»-!)  »(»-!)  « 


Yes 


A must  have  a positive  eigenvalue  of  multiplicity  2. 

True/False  7.3 

True 
False 
True 
True 
False 
True 
True 
True 
False 
True 
(k)  False 
False 


Exercise  Set  7.4 

1 Maximum:  5 at  (1,  0)  and  ( — 1,  0);  minimum:  _]  at  (0,  1)  and  (0,  — 1) 

Maximum:  7 at  (0,  1)  and  (0,  -1);  minimum:  3 at  (1,  0)  and  (-1,  0) 

Maximum:  9 at  (1,  0,  0)  and  (-1,0,  0);  minimum:  3 at  (0,  0,  1)  and  (0,  0,  -1) 

7 Maximum:  z = 4 ^2  at  (x,  j ) = ^2 ^2,  2 J and  ^ — 2 y/ 2 , — 2 J;  minimum:  z—  — 4 ^2  at  (x,  7 ) = ^ — 2 \[2,  2 J and  ^2  \[2,  — 2 


Critical  points:  (-1,  1),  relative  maximum;  (0,  0),  saddle  point 
Critical  points:  (0,  0),  relative  minimum;  (2,  1)  and  (-2,  1),  saddle  points 


Corner  points:  x 
21  <7  00=  A 


f2 


True/False  7.4 

False 

True 

True 

False 

True 


Exercise  Set  7.5 

" -2 i 4 5- 

1+i  3 -x  0 


1 


1 i 2-3  i 
A=  -1  -3  1 

2 + 3i  1 2 

a.  “13**31 

b.  “22  * “22 


9. 


A = A~L  = 


11. 


4 -i’ 

— i + |/3 

2/2 

2/2 


1-i|/3 

2/2 

2/2 


13. 


15. 


17. 


P = 


f f • *-[ 

/3  _ 

-1  — i 1+r 

f f • H 


-2  0 0 
0 1 0 
0 0 5 


19. 

0 

i 

2- 

A = 

i 

0 

1 

-2-3  i 

-1 

Ai 

21. 

a. 

<*13*  - 

031 

b. 

a\\ * - 

au 

29. 

B and  C must  comn 

37. 

i 

/2  /2 

_l_ L 

f2  ~f2 

Multiplication  of  x by  P corresponds  to  ||u||2  times  the  orthogonal  projection  of  x onto  W = span  (u)  . If  ||u||  = 1,  then  multiplications  of  x by  //  = / _ 2uu* 
corresponds  to  reflection  of  x about  the  hyperplane  u 1 • 

True/False  7.5 

(a)  False 
False 
True 
(d)  False 
False 

Chapter  7 Supplementary  Exercises 


1. 


-l 


3 1 
5 5 

1 2 

"5  5 


P = 


4 

0 

3" 

-1 

4 

9 

12' 

5 

5 

5 

25 

25 

9 

4 

12 

0 

4 

3 

25 

5 

25 

5 

5 

12 

3 

16 

3 

12 

16 

25 

5 

25 

■5 

25 

25 

1 

~f2 

1 

f2 

0 

'0 

0 

O' 

0 

0 

1 

; PrAP  = 

0 

2 

0 

1 

1 

0 

0 

0 

1 

f2 

f2 

positive  definite 
a.  parabola 
parabola 

Exercise  Set  8.1 

Nonlinear 

Linear 

Linear 


7. 


a.  Linear 
Nonlinear 


9.  T(xhx2)  = (-4xl*5x2,  *i  — 3*2);  7(5,  - 3)  = ( - 35,  14) 

1 T(x\rX2,x3)  = ( -xi  +4*2 -*3,  5*i  -5^2  “*3,  *1+3*3);  7(2,4,  - 1)  = (15,  -9,  - 1) 
13.  7(2vi  - 3v2  + 4v3)  = ( - 10,  - 7,  6) 


(a) 
7.  (a) 
(a) 


21. 


23. 


a.  (1.  -4) 

b-  (1,0,0),  (0,1,0),  (|,  -4,l) 


5 , 6 

Jl  I 4 

b.  f —14" 

19 

11 


c.  Rank(7T)  = 2,  nullity^  = 1 
Rank(^4)  = 2,  nullity(^4)  = 1 


25. 


c.  Rank(T)  = nullity (7)  = 2 
Rank(j4)  = nullity  (A)  = 2 


Kernel:  y-axis;  range:  xz-plane 
Kernel:  x-axis;  range:  yz-plane 

Kernel:  the  line  through  the  origin  perpendicular  to  the  plane  y = x;  range:  plane  y = x 


Nullity  (T)  = 2 
Nullity  (T)  = 4 
Nullity  (T)  = 3 

d.  Nullity(T)  = 1 

a.  3 

No 


A line  through  the  origin,  a plane  through  the  origin,  the  origin  only,  or  all  of 

35-  (b)  No 


ker(D)  consists  of  all  constant  polynomials. 

a.  JV(z)W«(x) 

b.  T(/ (*))=/(”+%) 


True/False  8.1 

a)  True 
False 
True 
d)  False 
True 
True 
False 
False 
False 

Exercise  Set  8.2 


a.  ker(T)  = {0};  T is  one-to-one 

ker(T)  = | 1 j | ; T is  not  one-to-one 

c ker(T)  = {0};  T is  one-to-one 
ker(T)  = {0};  T is  one-to-one 
e ker(T)  = {£(1,  1)}  ; T is  not  one-to-one 
ker(T)  = {£(0,  1,  — 1)}  ; 7 is  not  one-to-one 

a.  Not  one-to-one 
Not  one-to-one 
c One-to-one 

a.  ker(T)  = {*(  - 1,  1)} 

T is  not  one-to-one  since  ker(77)  * {0}  . 

T is  one-to-one 
T is  not  one-to-one 
T is  not  one-to-one 
T is  one-to-one 


11. 


a. 


J a 

b 

c 

b 

d 

e 

= 

\ c 

e 

/_ 

l\a 

b 

)- 

b 

I* 

d_ 

r 

c 

A 

a b 
c d 


T{ax 5 + bx 1 + cx)  = 


d. 


T{a  + b sin(x)  + c cos(x))  = 


T is  not  one-to-one  since,  for  example,  / (*)  = x2(x  — l)2  is  in  its  kernel. 
Yes;  it  is  one-to-one 

T is  not  one-to-one  since,  for  example  a is  in  its  kernel. 

19.  Yes 

True/False  8.2 

(a)  False 
True 
(c)  False 
True 

(e)  False 

(f)  False 

Exercise  Set  8.3 


1. 


a.  (72  0Tl)(x,y)  = (2x  - 3y,  2x  + 3y) 

b.  (T2cT1)(x,y)  = (4x-\2y,3x-9y) 

c.  (T2 o T\)(x,y)  = (2x  + 3y,  x-2 y) 

d.  (T2o71)(Ij)  = (0,2I) 

a.  a + d 

(T2  o T\)  (^4)  does  not  exist  since  T \ (A)  is  not  a 2 x 2 matrix. 


5 r2(v)  = Iv 


11. 


T has  no  inverse. 


b. 

77-1 

_*l' 

*2 

_*3_ 

= 

8*‘  + 8*2"4X3 
|*l+£*2  + J*3 

-8x,+fxa+4*3 

c. 

■*r 

*2 

_*3_ 

2*1_2*2+2*3 

7"1 

= 

4X1  + 2X2+2X3 

^1  +^2“^3 

'*l‘ 

3^1  4=  3^2  — *3 

7_1 

x2 

= 

—2x\  — 2x2  + x2 

x3 

—Ax\  — 5x2  4=  2x2 

13,  a.  aj  * 0 for  i = 1,  2,  3, n 

”•  *2-  *3.  *„)  = (^*1.  ^*2.  ^3 ^*») 

a-  Tf1  (/>(*))  = ^1;  771  (*(*))  = ,(*  - 1);  (72  o 70  (*>(*))  = ±p(x  - 1) 

17-  (a)  0.  - 1) 

(d)  7_1(2,  3)  = 2 + x 

21*  a.  7*1  0 ?2  = ?2  0 7*1 

b.  T\oT2*T2oT\ 

c.  TioT2  = T2oTi 


True/False  8.3 

True 
b)  False 
False 
True 
False 
0 True 


Exercise  Set  8.4 


a.  0 0 0 
1 0 0 
0 1 0 
0 0 1 

a.  [1  -1  1 

0 1 -2 
0 0 1 

a.  I"  0 0" 


8 4 
3 3 

a.  1 1 1 

0 2 4 
0 0 4 

b.  3 4=  10x  4=  16x2 


lT<Ji)]B  = 


b. 


T(y  1) 


44 

)- 


[T(v  2>]B-- 


T(y  2)  = 


18  I 

7 7 

107  24 

7 7 


c. 


d. 


19 

7 

83 

' 7 


11. 


a. 

T 

3" 

-r 

[T(vi)]b  = 

2 

, [T(v2)]b  = 

0 

. [r(v3)]B= 

5 

6 

-2 

4 

13. 


b.  7(vi)  = 16  + 51x  + 19x2,  T(v2)  = - 6 - 5x  + 5x2,  7(v3)  = 7 + 40x  + 15x2 

«•  7(fl0  + fllx  + fl2x2)=  239^0-16^  + 289^2  + 201«q  - 111^  + 247«2  ^ + 61^-31^  + 107^, 

d T(l+x2)  = 22  + 56x+  14x2 


[T2  o T\]b>b  = 


b.  [T2qT\\b\B=  V^2]b\B,,\T\]b,,,B 


0 0 o\ 

1 

0 ^0  0 

> [7’2]b',B',= 

'0  0 o' 

3 0 0 
0 3 0 

- ITl]£”,S  = 

'2  o' 

0 -3 
n n 

0 0 

0 0 3 

u u 

19. 


b. 


0 0 
0 0 


0 1 0 


0 0 


d. 

'2  1 O' 

4' 

14" 

14e2*  - Sxe2x  - 20x2e2x  since 

0 2 2 

6 

= 

-8 

0 0 2 

-10 

-20 

21. 


a,  Btt  Bn 

b.  B',  Bw 


True/False  8.4 

a)  False 
False 
(c)  True 
False 
True 

Exercise  Set  8.5 

1. 

[T]B=  l I?].  Wb,= 


3_  _5 6 
11  11 
_2_  J_ 
'll  11 


3. 


[T]b  = 


[T]b  = 


[T]b  = 


1 i_" 

13  25 

f2 

f2 

II/2 

ll/2 

1 1 

5 

9 

f2 

f2 

11/2 

11/2 

1 0 0 
0 1 0 
0 0 0 

2 _2 
3 9 

1 1 

2 3 


[*■]»  = 


1 0 0 
0 1 1 
0 0 0 


[T)b,-- 


=p 

[o  lj 


11. 


-{[4  m 

-3-^21 


—3  4°  ^2\ 


Bt  = 


13. 


a.  A = -4,  A=  3 

b Basis  for  eigenspace  corresponding  to  A = — 4 : — 2 + x 4=  x^',  basis  for  eigenspace  corresponding  toA  = 3:  5 — 2x  =F  x 2 


The  choice  of  an  appropriate  basis  can  yield  a better  understanding  of  the  linear  operator. 

True/False  8.5 

(a)  False 
True 
True 
True 
True 
(f)  False 
True 
(h)  False 

Chapter  8 Supplementary  Exercises 

1.  No.  T(x i +X2)  = Zl(xi  +X2)  +5*  (Zbq  -fZ?)  4=  (Ax2  + Z?)  = T(xi)  + T(x 2),  and  if c * 1,  then  T(cx)  = cAx± B *c(Ax± B)  =cT(x)  . 

T(e 3)  and  any  two  of  T(ei),  77(e2),  and  T(e4)  form  bases  for  the  range;  ( — 1,  1,  0,  1)  is  a basis  for  the  kernel. 

\y  Rank  = 3,  nullity  = 1 

7.  a Rank(T)  = 2 and  nuUity(T)  = 2 
T is  not  one-to-one. 
ip  Rank  = 3,  nullity  = 1 


13. 


10  0 0 
0 0 10 
0 10  0 
0 0 0 1 


15. 

'-4 

0 

IT)b, 

= 

1 

0 

0 

1 

17. 

1 - 

■1 

[T]b-- 

0 

1 

1 

0 

19. 

(b) 

II 

X, 

(c) 

II 

2 

V\ 

21. 

(d) 

The  points  ; 

25. 

0 0 

0 ■ 

1 0 0 
0 1 0 


0 0 — 


0 0 0 


tf  + 1 


Exercise  Set  9.1 
1.  x\  =2,  *2=1 

3.  *1=3,  *2  = -1 

5.  *1=  -1.  X2  = l.  *3  = 0 
7.  *1  = — 1,  X2  = 1,  7:3  = 0 
9.  xi  = —3,  7:2  = 1,  7:3  = 2,  7:4=1 

11.  n 


A = LU  = 


2 0 0 

-2  1 0 

2 0 1 


1 I 
0 0 
0 0 


b. 


A = L{DU{  = 


1 

2 

r 

p 

1 

0 

o' 

"2 

0 

o' 

1 

-1 

1 

0 

0 

1 

0 

0 

0 

1 

1 

0 

1 

0 

0 

1 

_0 

0 

1 

c. 

1 

0 

0" 

'2 

1 

-l" 

a = l2u2  = 

-1 

1 

0 

0 

0 

1 

1 

0 

1 

0 

0 

1 

13. 

"l 

0 

0~ 

'3 

0 

O' 

'1 

-4 

2' 

A = 

0 

1 

0 

0 

2 

0 

0 

1 

0 

2 

-2 

1 

0 

0 

1 

0 

0 

1 

15.  x,  = 2l  _M  „_i2 

1 17'  2 17'  3 17 


17. 

1 

-1  ol 

'1 

0 

0" 

'3 

0 

O' 

3 

A = 

0 

0 

1 

0 

2 

0 

0 

1 I 

; 

0 

1 

0_ 

3 

0 

1 

2 

0 

0 1 

19-  (b>  i 

b 

'1 

o' 

a 

b 

- 

1 

c 

d_ 

£ 

a 

1 

0 

ad  — be 
a 

*1=  *2  = -^.  *3  = '- 


True/False  9.1 

(a)  False 
False 
True 
d)  True 
True 

Exercise  Set  9.2 

a.  A3  dominant 

No  dominant  eigenvalue 

3. 


[ 0.98058' 

0.98837' 

0.98679" 

■ 

|_  — 0. 19612_ 

; x2w 

_ — 0. 1 5206  _ 

; x3w 

_ — 0. 16201  _ 

; x4w 

dominant  eigenvalue:  A = 2 * /lJD  « 5. 16228; 
dominant  eigenvector: 


1 1~ 

1 

3-/ToJ~ 

—0.16228 

x,  = [ I1].  A®  = 6;x2  = [ « 


A1^  = 6.6;  X3  ? 


-0.53846 

1 


x4; 


( |~  — 0.53488 


A^  « 6.60555; 


dominant  eigenvalue:  A = 3 4=  /T3  « 6.60555; 

3 


dominant  eigenvector: 


/26*4/T3 

2 t /l3 


-0.47186 

0.88167 


9. 

13. 


/26  + 4/H 

X2  = [-a8]'X3RJ[-o1929] 

b.  A®  = 2.8,  A®  « 2.976,  A®  « 2.997 

Dominant  eigenvalue:  A = 3;  dominant  eigenvector: 

0.1% 

2.99993 


T!] 


0.99180 

1.00000 


Starting  with 


Starting  with 


, it  takes  8 iterations. 


, it  takes  8 iterations. 


0.987151 
■0. 15977  J’ 


, A®  ss  6.60550; 


Exercise  Set  9.3 


1. 


"2" 

II 

o 

1 

0 

3 

3. 

0.39057 

"0.60971" 

h1SS 

0.65094 

, ai  ps 

0 

0.65094 

0.79262 

Sites  1 and  2 (tie);  sites  3 and  4 are  irrelevant 
Site  2,  site  3,  site  4;  sites  1 and  5 are  irrelevant 
Exercise  Set  9.4 

1*  a.  ^ 0.067  second 

b.  ps  66.68  seconds 

c.  ps  66,  668  seconds,  or  about  18.5  hours 

9.52  seconds 

b.  0.0014  second 

c.  ps  9.52  seconds 

d.  ps  28.6  seconds 

6.67  x 105  s f°r  forward  phase,  10  s for  backward  phase 
1334 

7.  «2  flops 
9.  2«3  — «2  flops 


Exercise  Set  9.5 

1.  0, 

3.  ^5 


A = 


A = 


A = 


1 

1 

f2 

_2_ 

1 

2 

3 

1 

3 

_2 

3 


1 

1 

1 

2 


1 

/2 


1 


11. 


4 = 


0 

/3 

f2 

'/3  {2 


True/False  9.5 

False 
b)  True 
False 
False 
True 
False 
g)  True 

Exercise  Set  9.6 


i 2 0 ho' 
o {2  L° 


J 2_ 

"8  0]  /5 

0 2 I _ 2 1 

f5 


1 

1 

[3/2 

0 

~f2 

f2 

0 

0 

1 

1 

0 

0_ 

_2_ 


0 

0 


0 

1 


1. 


[3/5] 


1 1 
'f2  f2 


4=  0 
ft 

1 1 

& f2 

<[3  {2 


3f2 


2 

3 

I 

3 

_2 

3 

J_ 

f3 

1 

/3 

‘/3 


\{3  0 

ri  o' 

0 {2 

[0  ij 

J_  J_" 

72  \[2 

[1  0]  + /2 


0 

J_ 

f2 

1 


[0  1] 


70,100  numbers  must  be  stored;  A has  100,000  entries 

True/False  9.6 

True 
True 
(c)  False 

Chapter  9 Supplementary  Exercises 


1. 


2 0 

—2  1 

2 0 0 

1 2 0 

1 1 2 


-3  1 
0 2 
1 2 3 
0 1 2 
0 0 1 


A=  3,  v = 


f2 

f2 


x5t 

X5f 


0.7100 

0.7041 

1 

0.9918 


[0.7071 
5 [o.7071 


3—  0 JL 

f2  f2 

0 1 0 

-L  0 -4= 


f2 


2 0 
0 0 
0 0 


11. 

1 

1 

2 

2 

12 

0 

6 ' 

1 

1 

4 

-8 

10 

2 

2 

"24 

0 1 

4 

-8 

10 

1 

1 

_ 0 

12j 

12 

0 

6 

2 

2 

1 

1 

2 

2 

1 ]_ 

{2  {2 

'f2  f2 


Exercise  Set  10.1 

a.  y = 3x  —4 

b.  y = =2x-fl 


2. 


a.  x2  +72  _ Ax  - 6y  + 4 = 0 or  (x  - 2)2  + O - 3)2  = 9 

b.  x2  +72  + 2x  - Ay  - 20  = 0 or  (x  + l)2  + O - 2)2  = 25 
3.  x2  H=  2xy  + .y2  — 2x  +7  = 0 (a  parabola) 


4. 


5. 


6. 


a.  x + 2y+z=0 

b.  — *4=7  — 22+1  = 0 


a.  x .y  z 0 

*1  71  *i  1 _Q 
*2  72  z2  1 
*3  73  z3  1 

b.  x 4=  27+z=0;  -x +7 -2z  = 0 


a.  x2  4-.y2  + z2  — 2x  — 4jy  — 2z  = — 2 or  (x  — l)2  + (y  — 2)2  + (2—  l)2  = 4 

b.  x2  +72  + z2  - 2x  - 27  = 3 or  (x  - l)2  + (y  - l)2  + z2  = 5 


10. 


y x2  x 1 

71  *1  *1  1 

72  *2  *2  1 

73  *3  *3  1 


= 0 


The  equation  of  the  line  through  the  three  collinear  points 

12.  0 = 0 


The  equation  of  the  plane  through  the  four  coplanar  points 

Exercise  Set  10.2 

!•  x\  = 2,  *2  = ■ji  maximum  value  of  z = yy- 

No  feasible  solutions 
Unbounded  solution 

Invest  $6000  in  bond  A and  $4000  in  bond  B\  the  annual  yield  is  $880. 


7 25  335 

jr  cup  of  milk,  ounces  of  corn  flakes;  minimum  cost  = = 18.6& 

9 18  18 


xi  > 0 and  X2  > 0 are  nonbinding;  2x\  + 3*2  < 24  is  binding 
b.  *1—  *2  ^ v f°r  v < — 3 is  binding  and  for  y < — 6 yields  the  empty  set. 
X2  < v for  v < 8 is  nonbinding  and  for  y < Q yields  the  empty  set. 


550  containers  from  company  A and  300  containers  from  company  B\  maximum  shipping  charges  = $2110 
925  containers  from  company  A and  no  containers  from  company  B\  maximum  shipping  charges  = $2312.50 
0.4  pound  of  ingredient  A and  2.4  pounds  of  ingredient  B\  minimum  cost  = 24. 8& 

Exercise  Set  10.3 

1.  700 

2*  a.  5 
b.  4 


Ox,  units;  sheep,  unit 

First  kind,  measure;  second  kind,  measure;  third  kind,  measure 
25  25  25 


*1 : 


(a2  + fl3 +-  + «>,) -<»1  ; Xj  = a . _ X1  j = 2_  3 „ 


n — 2 


Exercise  7(b);  gold,  30-y  minae;  brass,  9y  minae;  tin,  14-!-  minae;  iron,  5y  minae 

a 5x  +y  +z-K  = 0 
xA-ly+z-K  = 0 
x +y  A-Sz-K  = 0 


x = yjy,  y = yjy,  2 = y|y , K = t where  t is  an  arbitrary  number 
Take  t = 131,  so  that  x = 21,  y = 14,  z = 12,  K=  131- 


c.  Take  f = 262,  so  that  x = 42,  y = 28,  z = 24,  K = 262- 


7 9 

Legitimate  son,  577 staters;  illegitimate  son,  422 staters 

Gold,  30-^-  minae;  brass,  9-^-  minae;  tin,  14-^-  minae;  iron,  5-^-  minae 

First  person,  45;  second  person,  37 -1;  third  person,  22-^- 

Exercise  Set  10.4 

a.  S(x)  = — .12643 (x  — .4) 3 — .2021 1 (x  — 4)2  4-  92158(x  — .4)  + .38942 

b.  S(.  5)  = 47943;  error  = 0% 

a.  The  cubic  runout  spline 

b.  SO t)  = 3x3  - 2x2  + 5x  + 1 


SOt)  = 


- . 00000042 (x  4- 10) 3 

4= 

000214(x  4=  10) 

4= 

.99815, 

— 10  <x  < 0 

00000024(x)3 

,0000126(x)2 

4= 

,000088(x) 

4= 

.99987, 

0 <x  < 10 

- 00000004 (x  — 10) 3 - 

. 0000054 (x  - 10) 2 

- 

000092(x  — 10) 

+ 

.99973, 

10  <x  < 20 

00000022(x  — 20)3  - 

0000066(x  — 20)2 

- 

.000212(^-20) 

4= 

.99823, 

20  <x  < 30 

Maximum  at  ( x , S(x))  = (3.93,  1.00004) 


5. 

,00000009(x  + 10)3  - 

0000121(x  + 10)2 

4= 

.000282(x  4=  10) 

4= 

.99815, 

— 10  <x  < 0 

S(x)  = { 

00000009(x)3 

0000093(x)2 

4- 

.00007000 

+ 

.99987, 

0 <x  < 10 

. 00000004 (x  — 10) 3 - 

0000066(x  — 10)2 

- 

,000087(x  — 10) 

+ 

.99973, 

10  <x  < 20 

00000004(x  — 20)3  - 

0000053(x  — 20)2 

- 

. 000207 (x  — 20) 

4= 

.99823, 

20<x<30 

Maximum  at  ( x , S(x))  = (4.00,  1.00001) 


SW  = 


-4xJ  4=  3x 


0<x  <0.5 


4x3-12x2  + 9x-l  0.5  <x  < 1 


b , [2  - 2x  0.5  <x  < 1 

S(x)  = \2-2x  1 <x  < 1.5 

The  three  data  points  are  collinear. 


(b)  r 


4 

1 

0 

0 • ■ 

■ • 0 

0 

0 

l" 

’ M i 

yn- i 

2/1 

4= 

72 

1 

4 

1 

0 ■ ■ 

■ ■ 0 

0 

0 

0 

m2 

7 i 

2/2 

+ 

73 

0 

1 

4 

1 • ■ 

■ ■ 0 

0 

0 

0 

m3 

6 

72 

2/3 

4= 

74 

h2 

0 

0 

0 

0 ■ ' 

■ • 0 

1 

4 

1 

Mn- 2 

yn-  3 

- 2/„_2 

4- 

7m— 1 

1 

0 

0 

0 ■ ' 

■ • 0 

0 

1 

4 

Mn- 1 

yn-  2 

- 2/„_i 

4= 

71 

'2 

1 

0 

0 ■ ' 

■ ■ 0 

0 

0 

f 

’ Mi 

- 

hy\  - 

71  4 

72 

1 

4 

1 

0 ■ ' 

■ • 0 

0 

0 

0 

m2 

y\  - 

272  4 

73 

0 

1 

4 

1 • ■ 

■ • 0 

0 

0 

0 

m2 

6 

/2  - 

273  + 

74 

h2 

0 

0 

0 

0 • ' 

■ • 0 

0 

4 

1 

Mn- 1 

yn-  2 - 

27 

PM-1  ■+ 

7m 

0 

0 

0 

0 • • 

■ ■ 0 

1 

1 

2 

Mn 

^«-l  - 

7m  4 

hy„ 

Exercise  Set  10.5 

1.  o 


' .4 " 

. x®  = 

".46" 

(3r)_r.454" 

, x^  = 

'.4546' 

— [454541 

_.6_ 

_.54_ 

[.546 

_.5454_ 

' ‘ [.54546 

P is  regular  since  all  entries  of  P are  positive;  q = 


a. 

.7" 

'.23' 

'.273' 

x®  = 

.2 

. x©  = 

.52 

, x®  = 

.396 

.1 

.25 

.331 

b. 


P is  regular,  since  all  entries  of  P are  positive:  q = 


22 

72 

29 

72 

21 

72 


3-  a.  _9_ 

17 

_8_ 

17 

b.  [26 

45 

19 

45 

c.  J_ 
19 

19 

12 

19 


P”  = 


Pn- 


■-(«” 

0 0] 

1 lj 


, « = 1,  2,  — Thus,  no  integer  power  of  P has  all  positive  entries. 


as  n increases,  so 


for  any  X(P)  as  n increases. 


The  entries  of  the  limiting  vector 


are  not  all  positive. 


6. 

"i  i r 

1 

2 4 4 

3 

P2  = 

1 1 1 
4 2 4 

has  all  positive  entries;  q = 

1 

3 

1 1 1 

1 

4 4 2 

3 

7 IQ 
13 

54 ~%  in  region  1, 1 6^%  in  region  2,  and  29^%  in  region  3 
6 3 6 

Exercise  Set  10.6 


a.  0 0 0 1 

10  11 
110  1 
0 0 0 0 

b. Toiioo’ 

0 0 0 0 1 

10  0 10 

0 0 10  0 

0 0 10  0 

c.  [o  1 0 1 0 o' 

1 0 0 0 0 0 

0 10  111 

0 0 0 0 0 1 

0 0 0 0 0 1 

0 0 10  10 


b.  1- 

- step: 

P2 

2- 

- step: 

p 1 

- Pa 

—*  P2 

pi 

^P3 

—*  P2 

3- 

- step: 

p 1 

~~*P2 

- Pi 

p 1 

P3 

^P4 

—*  P2 

p 1 

* P 4 

P3 

—*  P2 

c.  1 “ 

- step: 

Pi 

* P 4 

2- 

- step: 

p 1 

* P3 

-*^4 

3- 

- step: 

Pi 

P2 

-*^1 

-fil 

p 1 

- P4 

—*  P3 

-^4 

(a)  1 0 0 0 0 

0 10  0 0 

0 0 110 
0 0 12  1 

0 0 0 1 2 

The  i yth  entry  is  the  number  of  family  members  who  influence  both  the  z'th  and  jth  family  members. 
5-  a.  {Pl.P2.P3) 

b.  {P3.P4.P5) 

c-  {Pi,  P 4.  P&.  P%)  and  {P 4.  P5.  Pi) 

a.  None 

b.  {P2.P4.P6) 

Power  of  P\  = 5 
Power  of  P2  = 3 
Power  of  P2  = 4 
Power  of  P4  = 2 

8.  First,  A;  second,  B and  E (tie);  fourth,  C;  fifth,  D 
Exercise  Set  10.7 
1.  a.  -5/8 

b.  [0  1 0] 

c.  [1  0 0 0]T 


0 0 11 
10  0 0 
0 10  1 
0 10  0 


Let  A - 


-[i :] 

[0  1],  q 


for  example. 


[0  1 0],  q 


p = [0  0 1],  q = 


. v = 2 


p =[0100],  q = 


* [5  31  * 

p =[s  n q = 

b. 

* r 2 ii 

p =[33}  q = 

C'  P*=H  0],  q*  = 

-•-[HI 

e. 

* Tj_  101 

p [l3  13  J’ 

5. 

* m xi  * 

p [20  20 J' q 

Exercise  Set  10.8 


, v=  — 2 


. 27 


70 

' 3 


v = 3 


19 

5 


q = 


29 
' 13 


20 


1. 


Use  Corollary  10.8.4;  all  row  sums  are  less  than  one. 
Use  Corollary  10.8.5;  all  column  sums  are  less  than  one. 


c. 

"2" 

1.9 

Use  Theorem  10.8.3,  with  x = 

1 

>Cx  = 

.9 

1 

.9 

E 2 has  all  positive  entries. 

Price  of  tomatoes,  $120.00;  price  of  corn,  $100.00;  price  of  lettuce,  $106.67 
5 $1256  for  the  CE,  $1448  for  the  EE,  $1556  for  the  ME 

6.  (b)  542. 

503 

Exercise  Set  10.9 

The  second  class;  $15,000 

2 $223 

3 1:1.90:3.02:4.24:5.00 

5 ^ / (g\  1 + g2_1  + ' ' ’ + Sft-1 ) 

6 1:2:3:  • • -:«-l 

Exercise  Set  10.10 

!•  - To  1 1 0_ 

0 0 11 

0 0 0 0 

oil 

2 2 


0 0 


1 1 
2 2 
0 0 0 0 


'-2  -1 

-1  -2" 

-1  -1 

0 0 

3 3 

3 3 

d. 

vo 

00 

o 

1.366  .500" 

0 -.500 

.366  .866 

0 0 

0 

0 

(b) 

(0,0,0),  (1,0,0), 

(1.1.0) 

(c) 

(0,0,0),  (1,  .6,  0), 

(1,  1.6,0),  (0,  1,0) 

a. 

"l  0 0" 

0-10 
0 0 1 

b.  _ 

-10  0 
0 1 0 
0 0 1 


'1  0 o' 

0 1 0 

0 0-1 


4. 


5. 


M i = 


o 

o 

' 1 1 

r 

r 

0 2 0 
0 0 i 

, M2  = 

2 2 

0 0 • ■ 

2 

• • 0 

, m3  = 

o 

o 

• • 0 

- 

1 0 
0 cos  20 
0 sin  20 


0 

—sin  20 
cos  20 


b. 


cos  ( — 45  ) 0 sin  ( — 45  ) 

'0  -1  0" 

ma  = 

0 1 0 

, m5  = 

1 0 0 

—sin  (—45)  0 cos  ( — 45) 

0 0 1 

Pf  = M5MAM3(M{P+M2) 


a. 

‘.3 

0 

0" 

"l 

0 

0 

'1 

1 • ■ 

■ • f 

M\  = 

0 

.5 

0 

, M2  = 

0 

cos  45 

—sin  45 

, m3  = 

0 

0 • ■ 

■ ■ 0 

0 

0 

1 

_0 

sin  45 

o 

cos  45 

0 

0 ■ ■ 

■ ■ 0 

m4  = 

cos  35  0 sin  35 

0 1 0 

II 

* 

cos  ( — 45)  —sin  ( — 45)  0 
sin  ( — 45)  cos  ( — 45)  0 

—sin  35  0 cos  35 

O 

O 

o 

o 

o 

o 

o 

m6  = 

0 0 • • • 0 

, m7  = 

0 1 0 

11  • • • 1 

0 0 1 

b.  Pt  = M1{M5MA{M2M{P+M3)  + M6) 


cos  $ 

0 

sin/? 

cos  a 

—sin  a 

0" 

R 1 = 

0 

1 

0 

. R 2 = 

sin  a 

cos  a 

0 

—sin/? 

0 

cos  $ 

0 

0 

1 

cos  9 

0 

sin0‘ 

cos  a 

sin  a 

o' 

r3  = 

0 

1 

0 

, Ra  = 

—sin  a 

cos  a 

0 , 

—sin  6 

0 

cos  9 

0 

0 

1 

cos  $ 

0 

—sin/? 

r5  = 

0 

1 

0 

sin/? 

0 

cos  $ 

7. 


a. 


M = 


10  0 
0 1 0 70 
0 0 1 zo 
0 0 0 1 


b.  10  0-5 
0 10  9 

0 0 1-3 
0 0 0 1 


Exercise  Set  10.11 

1.  * 


t = 


0 

1 

4 

1 

4 

o 4 4 


1 

1 

0 

4 

4 

_ _ 

"0" 

0 

0 

1 

t\ 

1 

4 

1 

*2 

+ 

2 

0 

0 

*3 

0 

4 

1 4 

1 

1 

1 

0 

2 

4 

4 

V 

’ 3 " 

’ 7 " 

~15~ 

0 

8 

16 

32 

64 

1 

5 

11 

23 

47 

2 

t(2)  = 

8 

, t©= 

16 

,t«>= 

32 

64 

0 

’ 

1 

3 

7 

, 

15 

1 

8 

16 

32 

64 

2 

5 

11 

23 

47 

8 

16 

32 

64 

t®= 


for  ti  and  1 3,  —12.9%;  f°r  £2  and  5.2% 


t®- 


2 1 
2 

3. 


[3 

5 2 

5 

4 2 

5 

4 

3lr 

4 4 

4 

4 4 

4 

4 

4j 

r 13 

18 

9 

22 

13 

7 

21 

16 

10] 

[16 

16 

16 

16 

16 

16 

16 

16 

16  J 

Exercise  Set  10.12 


a-  xf  = (1.40000,  1.20000) 
x®  = (1  41000,  1.23000) 
x®  = (1.40900,  1.22700) 
xf  = (1.40910,  1.22730) 
xf  = (1  40909,  1.22727) 

xf  = (1.40909,  1.22727) 
Same  as  part  (a) 

c-  xf  = (9.55000,  25.65000) 
xf  = (.59500,  — 1.21500) 
xf  = (1.49050, 1.47150) 
xf  = (1.40095, 1.20285) 
xf  = (1.40991, 1.22972) 
xf  = (1.40901, 1.22703) 


64 

1 


64 

64 


4.  **  = (1.1),  ^ = (2.0),  *5  = (1.1) 
x7  4=xg  4-X9  = 13.00 
X4  + X5  = 15.00 
xi  +X2  + *3  = 8.00 
82843(x6  + xg)  + .58579^9  = 14.79 
1.41421  (x3  +xj  + x7)  = 14.31 
. 82843  (x2  + x4)  + . 58579*  1 = 3. 8 1 
*3  4=  *6  + *9  = 18.00 
*2  + *5  + *8  = 12.00 
*1  +X4  + X7  = 6.00 

. 82843  (x  2 + x6)  + .58579*3  = 10.51 
1.41421  (xi  +X5  + X9)  = 16.13 
. 82843  (x4  + x8)  + . 58579*7  = 7. 04 

X'l  =bxg  + X9  = 13.00 
X4  + X5  + xg  = 15.00 

xi  +*2  +X3  = 8.00 

04289(x3  4=  X5  + x7)  4=  ,75000(xg  + xg)  4=  .61396*9  = 14.79 
91421(x3  +X5  +X7)  + .25000(*2  + *4  + *^  + xg)  = 14.31 
04289(x3  + X5  + x7)  4=  .75000  (*2  4=  x4)  4=  .61396xi  = 3.81 

*3  +*6  + X9  = 18.00 

*2  4=*5  4=xg  = 12.00 

+X4  + X7  = 6.00 

04289(xi  +x5  4-x9)  4-  ,75000(x2  +x6)  4-  ,61396x3  = 10.51 
.91421  (xi  +X5  + X9)  4=  .25000(*2  + X4  + X6  4-xg)  = 16.13 
,04289(xi  +X5  + X9)  4-  .75000(x4  + xg)  4=  .61396x7  = 7.04 


Exercise  Set  10.13 

1. 

TA 


>(Mi  ;k 


Si 


i = 1,  2,  3,  4,  where  the  four  values  of 


Si 


and 


=ln(4)  / In 


»)= 


s ps  .47;  d}j(S)  ps  ln(4)  / ln(  1 / .47)  = 1.8  . ...  Rotation  angles:  0°  (upper  left);  —90°  (upper  right);  ]80°  (lower  left);  180°  (lower  right); 

3.  (0,  0,  0),  (1,  0,  0),  (2,  0,  0),  (3,  0,  0),  (0,  0,  1),  (0,  0,  2),  (1,  2,  0),  (2,  1,  3),  (2,  0,  1),  (2,  0,  2),  (2,  2,  0),  (0,  3,  3) 

(i)  s = -i;  (ii)  all  rotation  angles  are  0°;  (in)  = In (7)  / ln(3)  = 1.771  . ...  This  set  is  a fractal. 

(i)  s = i;  (ii)  all  rotation  angles  are  180°;  (hi)  = ln(3)/ln(2)  = 1.584...  This  set  is  a fractal. 

(i)  s = -i;  (ii)  rotation  angles:  _90°  (top);  180°  (lower  left);  ]80°  (lower  right);  (iii)  dx(S)  = ln(3)  / ln(2)  = 1.584  . ...  This  set  is  a fractal. 

(i)  s = (ii)  rotation  angles:  90°  (upper  left);  180°  (upper  right);  180°  (lower  right)  (iii)  df{(S)  = ln(3)  / ln(2)  = 1.584  ....  This  set  is  a fractal. 


s=  .8509..., 0=  -2.  69°... 


(0.766,  0.996)  rounded  to  three  decimal  places 
7.  itf(S)=ln(16)/ln(4)  = 2 

8-  ln(4)/ln||j  = 4.818... 

d}{(S)  = ln(8)  / ln(2)  = 3;  the  cube  is  not  a fractal. 

0*  k = 20;  s = -j;  df{(£>)  = ln(20)  / ln(3)  = 2.726...;  the  set  is  a fractal. 


1.888... 


11. 


Initial  set 


First  iterate 


Second  iterate 


Third  iterate 
Fourth  iterate 


dH(X)  =ln(2)  /ln(3)  = 0.6309... 

Area  of  Sq  = 1 ; area  of  S\  = = 0. 888. . 


i of  £2  = ||j2  = 0.790... ; area  of  S3  = ||j3  = 0.702... ; area  of  S4  = ||J4  = 0.624... 


Exercise  Set  10.14 

1.  n(250)  = 750,  n(25)  = 50,  EI(125)  = 250, 0(30)  = 60, 0(10)  = 30, 0(50)  = 150, 0(3750)  = 7500, 0(6)  = 12, 0(5)  = 10 

c (0.  0)  > ; one  l-.y.l,  {(f . ")  (f . f } (".  f )}  »'»  {(f  ■ »)■  (?■?}(«' °)’  (I’  I)}  “d  { ("■  f } (f  !)'  (°’  ?)'(«'  !)} 1 1 

- ^ {hm?-  ?)•  (!•  f )•  (§■?)■  (?•  a a- 1}  (»■  ?)■  (?■  i } (i-  (t  • t}  (?•  ?)(?•?)}  - 

{ (f »)■  (f  i)  (!•  !)•  (!•  !)•  (?•  !)•  (£•  i>  (!■  »)•  (I-  !)•  (f  !)•  (f  i)  (!•  !)■  (!■  !)}■  "<«  - » 

3*  (a)  3,  7,  10,  2,  12,  14,  11,  10,  6,  1,  7,  8,  0,  8,  8,  1,  9,  10,  4,  14,  3,  2,  5,  7,  12,  4,  1,  5,  6,  11,  2,  13,  0,  13,  13,  11,  9,  5,  14,  4,  3,  7,.~ 

(5,  5),  (10,  15),  (4,  19),  (2,  0),  (2,  2),  (4,  6),  (10,  16),  (5,  0),  (5,  5),... 


The  first  five  iterates  of 


Mr4 


(w-tjtM 


3_ 

101  ’ 101 


1 P-  -M  I J2-  JU 
/ [m  ’ ioi  / [m  ’ ioi  ) 


, and 


Uoi ! 


55  1 

101 


(b) 


The  matrices  of  Anosov  automorphisms  are 


3 2 

1 1 


and 


5 7 
2 3 


The  transformation  affects  a rotation  of  S through  90  in  the  clockwise  direction. 


(0. 1) 


(1. 1) 


IV 


(0. 1/2) 


(1.1/2) 


phi  m 


(0.0) 


In  region  I:  | b = 


(1.0) 


(0.1)  (1/2,1)  (1.1) 

III*  f 

IV  II* 

(0.0)  (1/2,0)  (1.0) 


; in  region  II 


: .*]  = [_!}  “--egionni:  [l]  = [_]} 


in  region  IV: 


-1 

-2 


0-,  — j and  j^-,  -jJ  form  one  2-cycle,  and  j-^-,  j j and  |-=r,  ^ ) form  another  2-cycle. 


Begin  with  alQlxlOl  array  of  white  pixels  and  add  the  letter  ‘A’  in  black  pixels  to  it.  Apply  the  mapping  to  this  image,  which  will  scatter  the  black  pixels 
throughout  the  image.  Then  superimpose  the  letter  ‘B’  in  black  pixels  onto  this  image.  Apply  the  mapping  again  and  then  superimpose  the  letter  ‘C’  in  black  pixels 
onto  the  resulting  image.  Repeat  this  procedure  with  the  letters  ‘D’  and  ‘E’.  The  next  application  of  the  mapping  will  return  you  to  the  letter  ‘A’  with  the  pixels  for 
the  letters  ‘B’  through  ‘E’  scattered  in  the  background. 

Exercise  Set  10.15 


1. 


GIYUOKEVBH 

SFANEFZWJH 


H 

t invei 

*-U 


12  7 

23  15 


Not  invertible 


d.  Not  invertible 

e.  Not  invertible 

15  12 
21  5 


WE  LOVE  MATH 


Deciphering  matrix  = 


7 15 
6 5 


; enciphering  matrix  = 


7 5 
2 15 


THE  Y SPLIT  THE  A TOM 
I HAVE  COME  TO  BURY  CAESAR 
a.  010110001 


b. 


0 1 1 
1 1 1 
1 0 1 


A is  invertible  modulo  29  if  and  only  if  det(^)  * 0 (mod  29). 

Exercise  Set  10.16 


2. 


— 4 (1 


5) 


n+ 1 


Oo  -cq) 


b-i 


M+l 


(ao-co) 


a”~*\ 

n=\,  = i 

C”^4 


^2«+l  = i=t~  r\ sn  (2ao-bo-4cQ) 
3 6(4) 

^2m+i  = 3 “ yr(2aQ  - ^0  - 4co) 

c2m+1  = 0 


* = 0,  1,2,... 


a2" = n + 6(4)«  (2a° - b°  ~ 4c») 
*2„  = ^ 

C2”  = — *6(4)”"  (2a°  — 4co) 


» = 1,2. 


4 1 

Eigenvalues:  Aj  = 1,  A2  = eigenvectors:  ei  = 

12  generations;  .006% 


e2  — 


1 

= 1 


2 + 3'^T[(-3_'^)(l  + '^)"+1  + (_3  + '^)(1_'^)”+1] 


I 

3 4- 

1 


„+l  [(1  + |^5)  +(1-/5)  ] 

krl(.  i + ^)"+1  + 0-^)"+1] 


dM  + l 


^ + ^-^jT[(-3-/5)(1  + /5)”+  +(-3  + ^5)(l-/5) 


,«+L 


10  0 0 
0 0 0 0 
0 0 0 0 
0 0 0 1 


Exercise  Set  10.17 

1.  a 

a.  o 

A»  = 2-  Xl  = 


b x®  = 


100 

50 


x^  = 


175 

50 


c x<®  = Zx®  = 


857 

285 


x®  = 


[“] 


I x(^  _ [382]  (5)  _r570l 

J’  — 125 J’  ~ |_  19 1 J 


, x®~Aix® 


= P551 

|_287  J 


2.375 

1.49611 

Exercise  Set  10.18 


1. 


Yield  = 33-^-%  of  population;  xi  = 


Yield  = 45.8%  of  population;  xi  = 


; harvest  57.9%  of  youngest  age  class 


xi  = 


1.000 

2.090 

.845 

.845 

.824 

.824 

.795 

.795 

.755 

.755 

.699 

, Ixi  = 

.699 

.626 

.626 

.532 

.532 

0 

.418 

0 

0 

0 

0 

0 

0 

1.090  I .418  igg 
7.584 


4 hj=(R—\)f  (ajb\b2  ■ 

5 jh  _ «1  + a2b\  + • ■ • 4-  {aj-\b\b2  • • • 67-2)  ~ 1 


■ ■ ■ bj- 1 + 

Exercise  Set  10.19 


*/-l+  • • ■ +a„bib2 • ■ 
-i&i&2  • • -^y-2)  ~ ^ 
+ ■ ‘^7-2 


-j-  + 4 cos  t + cos  2t  + “Cos  3^ 


2,  T2 

3 


cos  ^ 


■t  COS  TjJ-f  + 4r  cos  4s H 
)2  l 'T2  I 


— cos  —t 
d2  T 


# , 1 • 4?r  f , 1 • 6tt  < , 1 • 8tt 
sin  — / + — sin  -^-t  + — sin  + — sin  -^-t  I 


-£(■ 

* + 2 

T ST  i 1 2 vt  ■ 1 . . . 6?rt  . 1 lOirt 

— ■ cos  -=-  + — cos  -=-  + — - cos 

T 62  7 102  7 


^ ~ -f  "t-  sin  t “ cos  2^  * — — %—  cos  At 
n 2 3 ¥ 15?r 

4, 


cos  3^  — ■ 


1 


(2»-l)(2»4-l) 
1 


cos  nt  J 
2nxt 


(2»)‘ 


2 cos  r 


Exercise  Set  10.20 

a.  Yes;  v = yvi  4=  -|v2  4=  -|v3 
No;  v = -jvi  -h  jv2  — yv3 
c-  Yes;  v = ^v\  4=  ^v2  + 0v3 
d.  Yes;  v = -jyVi  + JJV2  + ^-v3 

^ = number  of  triangles  = 1,  n = number  of  vertex  points  = 7,  = number  of  boundary  vertex  points  = 5;  Equation  (7)  is  7 = 2(7)  — 2 — 5. 

3 w=  Mv4-b  = M(c\v\  +C2v2  + c3v3)  + (c\  + c2  4=C3)b 

= c\(Mv\  + b)  +c2(Mv2  4-b)  4 »c3(Mv3  4-b)  = ciwi  4=c2W2  4-c3w3 


4. 


b. 


V3 


V4 

V5 


v6 


a. 

M = 

_1 

2" 

b = 

T 

_0 

1_ 

2 

b. 

M = 

"3 

- 

r 

, b = 

J1 

O' 

_1 

1_ 

L 

1_ 

c. 

M = 

_1 

0" 

b = | 

2' 

_0 

1_ 

3_ 

d. 

"1 

1 

1 ' 

M = 

2 

, 

b = 

2 

_ 2 

0 

•1_ 

Two  of  the  coefficients  are  zero. 

At  least  one  of  the  coefficients  is  zero, 
c.  None  of  the  coefficients  are  zero. 

a-  JVI  + JV2  + ±v3 

b'  [7] 
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